Thread (12 messages) 12 messages, 4 authors, 2016-06-15

Re: [RFC] git blame-tree

From: Will Palmer <hidden>
Date: 2016-06-15 22:50:42

On Wed, 2011-03-02 at 11:40 -0500, Jeff King wrote:
[I know, I know, another RFC. I'll get to actually cleaning up and
submitting some of these patches soon.]

It's sometimes useful to get a list of files in a tree along with the
last commit that touched them. This is the default tree view shown on
github.com, but it can also be handy from the command line (there has
been talk lately of having a "git ls"), or as plumbing for a local
fancier tree view. E.g., something like:

     add.c 6e7293e git-add: make -A description clearer vs. -u
   apply.c fd03881 add description parameter to OPT__VERBOSE
   blame.c 9ca1169 parse-options: Don't call parse_options_check() so much
  branch.c 62270f6 branch_merged: fix grammar in warning
  bundle.c 62b4698 Use angles for placeholders consistently

The obvious naive way to do this is something like:

  for i in `git ls-tree --name-only HEAD`; do
    echo "`git rev-list -1 --no-merges HEAD -- $i` $i";
  done

which is really slow, because we end up traversing the same commits many
times (plus the startup overhead for each rev-list).  It takes about 35
seconds to run on git.git.

So the next obvious thing is to do one traversal, output the changed
files for each commit, and then mark each file as you see it. The perl
script below does this (though the careful reader will note it is
actually buggy with sub-trees; I didn't bother fixing it since it was
just a stage in the evolution):
[code snipped]
This runs in about 3 seconds. And besides the above-mentioned bug,
also doesn't properly handle things like filenames that need quoting.

So I wrote it in C, which drops the time down to about 1.5 seconds, and
of course doesn't have any parsing issues.  The patch is below.

I wasn't sure at first what to call it or what the calling conventions
should be. The initial thought was to make it part of "ls-tree". But
that feels wrong, as ls-tree otherwise never cares about traversal. The
combination of traversal and diff made me think of blame, and indeed, I
think this is really just about blaming a whole tree at the file-level,
rather than at the content-level. Thus I called it blame-tree, and I
used the same calling conventions as blame: "git blame-tree <path>
<rev opts>". See the test script for examples.

I have many thoughts on the patch already, but rather than put them
here, I'll include the patch without further ado, and put them inline in
a reply.
[patch snipped]

Coincidentally, I'm doing a similar thing in a shell script at the
moment. Unfortunately, no tree-object is involved: I'm instead using the
output from "git diff" on two different branches to generate a list of
files I care about. How hard would it be to accept a nul-delimited list
of filenames via stdin, rather than from a tree? If I'm reading this
right, it looks like a pretty trivial change. (I couldn't get the
existing patch to apply, myself.. I assume I'm just doing something
wrong as I don't need to use "git am" very often.)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help