Re: [GSoC RFC PATCH v2 1/7] repo-info: declare the repo-info command
From: Justin Tobler <hidden>
Date: 2025-07-09 20:11:11
On 25/07/07 08:01AM, Patrick Steinhardt wrote:
On Fri, Jul 04, 2025 at 06:40:11PM -0300, Lucas Seiki Oshiro wrote:quoted
quoted
Would it make sense to maybe have such whole-repo commands grouped together in a `git repo` top-level command? E.g. `git repo info` for your command, `git repo size` to gather information about the repo size.It seems to be very nice for me! In fact, this being a home also for statistics is something I considered while writing the first versions of my GSoC proposal. And what about merging the two codes into a single API? Something like:git repo-info layout.bare references.format survey.commit-count { "layout": { "bare": true }, "references": { "format": "files" }, "survey": { "commit-count": 42 } } ?We could in theory do that. But there's two things we need to be cautious about: 1. We should be mindful about what specifically this tool is about. It shouldn't become the next tool that does way too many different things. 2. One of the idea of git-survey(1) is to eventually replace git-sizer(1). This will require very specific presentation formats that aren't really compatible with any of the other information. Out of these two I think the second item is the more important one why git-survey(1) should exist as a standalone tool, either as a top-level command or as a subcommand.
As Patrick mentioned, the focus for git-survey(1) is to be an eventual substitute for git-sizer(1). For the initial implementation I was imagining a simple plaintext format that outputs key/value pairs and looks something like the following example: references.branches.count=15 references.tags.count=2 references.remotes.count=5 references.others.count=1 objects.commits.count=50 objects.commits.total_size=1234567 objects.commits.max_size.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3 objects.commits.max_size.size=1234 objects.commits.max_parents.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3 objects.commits.max_parents.count=8 objects.trees.count=100 objects.trees.total_size=12345 objects.trees.total_tree_entries=999 objects.trees.max_tree_entries.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3 objects.trees.max_tree_entries.count=99 objects.blobs.count=142 objects.blobs.total_size=99999999 objects.blobs.max_size.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3 objects.blobs.max_size.size=999999 objects.tags.count=1 repo.max_depth=999 <etc...> The command will also need to eventually support other output formats, namely a more human friendly table format that provides something similar to git-sizer(1). As layed out above, this looks like it could also work well with the git-repo-info(1) JSON format. This makes me wonder if we should add this functionality as a separate flag for git-repo-info(1). Maybe something like `--stats` and append the info do the output. If we want a more clear distiction though, we could implement this as a separate subcommand. For a more human-readable format, maybe we could still implement a standalone git-survey(1) that is more of a porcelain command and uses git-repo-info(1) under the hood. I think the other information such as reference format and object format may be useful to provide in git-survey(1) output.
quoted
During our meetings, Karthik suggested (I'm planning to it later) to also allow to request an entire category instead of only the fields. Then, this would also be possible:$ git repo-info survey { "survey": { "commit-count": 42, "blob-count": 1234 }It raises another question though: if we ever were to add `--all` we'll need to step a bit careful about what kind of information we add to this tool. All of the information proposed so far can be computed rather trivially. But computing repository sizes has way higher computational complexity and may easily take seconds, maybe even minutes in large repositories. That to me further points into the direction of giving those two tools a common top-level command (`git repo info`, `git repo survey`), but to not mix concerns too much with one another.
Getting the info for git-survey(1) is certainly more computationally complex so there should be a way to run the command without performing the more expensive checks if the user doesn't want them. At the same time, I think it may be nice to have a way for a user to request a dump of "interesting" repository info via a single command.
quoted
But I don't know what are Justin's plans for git-survey, if it would be a porcelain command for showing those stats to the user of if it is targeted for being parsed like this `repo-info`.
I think the intent for git-survey was to provide a more porcelain command to display interesting repository stats to the user, but also provide an option to print in a machine-parsable format. I like the idea of computing everything as part of git-repo-info though. This could allow a standalone git-survey to focus on just being a human-friendly porcelain command. For scripted use-cases, users could then just use git-repo-info. -Justin