Thread (178 messages) 178 messages, 10 authors, 2025-08-16

Re: [GSoC RFC PATCH v2 1/7] repo-info: declare the repo-info command

From: Justin Tobler <hidden>
Date: 2025-07-09 20:11:11

On 25/07/07 08:01AM, Patrick Steinhardt wrote:
On Fri, Jul 04, 2025 at 06:40:11PM -0300, Lucas Seiki Oshiro wrote:
quoted
quoted
Would it make sense to maybe have such whole-repo commands
grouped together in a `git repo` top-level command? E.g. `git repo info`
for your command, `git repo size` to gather information about the repo
size.
It seems to be very nice for me! In fact, this being a home also for
statistics is something I considered while writing the first versions of
my GSoC proposal.

And what about merging the two codes into a single API? Something like:
git repo-info layout.bare references.format survey.commit-count
{
  "layout": {
    "bare": true
  },
  "references": {
    "format": "files"
  },
  "survey": {
    "commit-count": 42
  }
}

?
We could in theory do that. But there's two things we need to be
cautious about:

  1. We should be mindful about what specifically this tool is about. It
     shouldn't become the next tool that does way too many different
     things.

  2. One of the idea of git-survey(1) is to eventually replace
     git-sizer(1). This will require very specific presentation formats
     that aren't really compatible with any of the other information.

Out of these two I think the second item is the more important one why
git-survey(1) should exist as a standalone tool, either as a top-level
command or as a subcommand.
As Patrick mentioned, the focus for git-survey(1) is to be an eventual
substitute for git-sizer(1). For the initial implementation I was
imagining a simple plaintext format that outputs key/value pairs and
looks something like the following example:

  references.branches.count=15
  references.tags.count=2
  references.remotes.count=5
  references.others.count=1
  objects.commits.count=50
  objects.commits.total_size=1234567
  objects.commits.max_size.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3
  objects.commits.max_size.size=1234
  objects.commits.max_parents.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3
  objects.commits.max_parents.count=8
  objects.trees.count=100
  objects.trees.total_size=12345
  objects.trees.total_tree_entries=999
  objects.trees.max_tree_entries.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3
  objects.trees.max_tree_entries.count=99
  objects.blobs.count=142
  objects.blobs.total_size=99999999
  objects.blobs.max_size.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3
  objects.blobs.max_size.size=999999
  objects.tags.count=1
  repo.max_depth=999
  <etc...>

The command will also need to eventually support other output formats,
namely a more human friendly table format that provides something
similar to git-sizer(1). As layed out above, this looks like it could
also work well with the git-repo-info(1) JSON format. This makes me
wonder if we should add this functionality as a separate flag for
git-repo-info(1). Maybe something like `--stats` and append the info do
the output. If we want a more clear distiction though, we could
implement this as a separate subcommand.

For a more human-readable format, maybe we could still implement a
standalone git-survey(1) that is more of a porcelain command and uses
git-repo-info(1) under the hood. I think the other information such as
reference format and object format may be useful to provide in
git-survey(1) output.
quoted
During our meetings, Karthik suggested (I'm planning to it later) to also
allow to request an entire category instead of only the fields. Then, this
would also be possible:
$ git repo-info survey
{
  "survey": {
    "commit-count": 42,
    "blob-count": 1234
}
It raises another question though: if we ever were to add `--all` we'll
need to step a bit careful about what kind of information we add to this
tool. All of the information proposed so far can be computed rather
trivially. But computing repository sizes has way higher computational
complexity and may easily take seconds, maybe even minutes in large
repositories.

That to me further points into the direction of giving those two tools a
common top-level command (`git repo info`, `git repo survey`), but to
not mix concerns too much with one another.
Getting the info for git-survey(1) is certainly more computationally
complex so there should be a way to run the command without performing
the more expensive checks if the user doesn't want them. At the same
time, I think it may be nice to have a way for a user to request a dump
of "interesting" repository info via a single command.
quoted
But I don't know what are Justin's plans for git-survey, if it would be a
porcelain command for showing those stats to the user of if it is targeted
for being parsed like this `repo-info`.
I think the intent for git-survey was to provide a more porcelain
command to display interesting repository stats to the user, but also
provide an option to print in a machine-parsable format. I like the idea
of computing everything as part of git-repo-info though. This could
allow a standalone git-survey to focus on just being a human-friendly
porcelain command. For scripted use-cases, users could then just use
git-repo-info.

-Justin
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help