Re: [PATCH 0/2] Make diff plumbing commands respect the indentHeuristic.
From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2017-05-09 07:59:00
On Tue, May 9, 2017 at 5:16 AM, Jeff King [off-list ref] wrote:
On Mon, May 01, 2017 at 12:34:38PM +0200, Ævar Arnfjörð Bjarmason wrote:quoted
quoted
I don't know if we would want to be extra paranoid about patch-ids. There is no helping: git rev-list HEAD | git diff-tree --stdin -p | git patch-id --stable because diff-tree doesn't know that it's trying for "--stable" output. But the diffs we compute internally for patch-id could disable the heuristics. I'm not sure if those matter, though. AFAIK those are used only for internal comparisons within a single program. I.e., we never compare them against input from the user, nor do we output them to the user. So they'll change, but I don't think anybody would care.I have a few-million row table with commit_id as one column & patch_id as another. I.e. a commit -> patch_id mapping.Thanks for this data point. It's always interesting to hear about unforeseen uses of the tools. Out of curiosity, how do you generate the patch-ids? Is it with something like diff-tree piped to patch-id?
This:
my $cmd = qq[git --git-dir="$repository_path" log --since="$since"
--until="$until" --all --pretty=format:%H --binary | git patch-id];
open my $patch_id_fh, " $cmd |";
Which is part of a loop that generates since/until for continuous
pull/insertion. Also, a few lines later there's a workaround for the
git.git bug of patch-id being ^0+$ (fixed in 2485eab55c
("git-patch-id: do not trip over "no newline" markers", 2011-02-17)),
which gives you a sense of how long it's been since anyone's touched
this.
I do feel a bit sad about breaking this case (or at the very least forcing you to set an option to retain cross-version compatibility). But my gut says that we don't want to lock ourselves into never changing the diff algorithm (and I'm sure we've done it inadvertently a few times over the years; even the recent switch to turning on renames would have had that impact).
As noted I think it's completely fine to change the patch-ids by changing the diff algorithm. I'm about to give some more detail on this in the other thread, but I find that on our repos the indent heuristic changes the patch-id for around 2% of patches, which seems fairly typical for non-changelog-y code. You *then* need to be using topic branches you didn't delete as well as having authored such a patch for this change to kick in, so the impact is really minimal. Even if it somehow changed 100% of the ids that would be fine too. It would auto-heal as the same git version started reading & inserting the ids, which are only relevant in a moving window.