Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
From: Phillip Wood <hidden>
Date: 2024-07-18 14:23:13
Possibly related (same subject, not in this thread)
- 2024-07-11 · Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty · Jeff King <hidden>
- 2024-07-10 · Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty · Junio C Hamano <hidden>
On 11/07/2024 22:26, Jeff King wrote:
On Wed, Jul 10, 2024 at 02:46:30PM +0100, Phillip Wood wrote:quoted
quoted
It's tempting to say that we should just make sure that we generate a diff without that option. But we may parse diffs not only generated by Git, but ones that users have manually edited. And POSIX calls the decision of whether to print the space here "implementation-defined".Do we ever parse an edited hunk with this code? I'm not sure there is a way of splitting a hunk after it has been edited as I don't think we ever display it again.Hmm, I just assumed this code would see the edited hunk, but now I'm not sure. Note that the changes required do go outside of split_hunk(); the initial parse_diff() needs to decide whether the hunk is splittable in the first place (as an aside, that puzzled me at first why just changing split_hunk() was enough for the case that started this thread, but not the one in the included test. The difference is that without the empty line as context, the hunk in the test wouldn't be splittable at all). But looking closer: yes, I do think parse_diff() is used only for the initial patch.
That's true but I've realized that we do in fact allow the user to re-display and split an edited hunk. If there are two changes in a file then you can edit the first hunk and when prompted about the second press 'K' to go back to the first one and then split it with 's' (I messed up my test for this before sending my previous mail as I changed two separate files rather than putting two changes in a single file). So split_hunk() can encounter empty context lines even without diff.suppressBlankEmpty being set as lots of editors strip the leading space when the rest of the line is empty[1]. As we haven't had any bug reports about that I suspect people are not splitting the hunks they edited which I guess makes sense. I think there is another bug lurking for anyone who does try to split an edited hunk as we don't update `hunk->splittable_into` after it has been edited and the edit might change a deleted of an empty line to a context line or vice versa. We should make sure any garbage at the end of the hunk is discarded as well so we don't show it when we display the edited hunk. Best Wishes Phillip [1] When I added the code to recount the hunk header rather than relying on "git apply --recount" initially it did not support empty context lines and we received quite a few bug reports pretty quickly after it was released.
So we really would only see git-generated patches here. Which I think takes away my ambiguity concern, but does mean the commit message is wrong.quoted
quoted
I don't think we'd ever generate this ourselves, but could somebody manually edit a hunk into this shape? When I tried it in practice, it looks like we fail to apply the result even before my patch, though. I'm not sure why that is. If I put "some garbage" without the blank line, we correctly realize it should be discarded. It's possible I'm just holding it wrong.When we recount the hunk after it has been edited we ignore lines that don't begin with '+', '-', ' ', or '\n' so if you add some garbage at the end of the hunk the recounted hunk header excludes it when it gets applied.OK, that makes sense. And we could never rely on the hunk header in the edited hunk anyway, since the whole point is that we have to recount it. So the user must accept that an extra blank line is potential context (and that goes all the way back to bcdd297b78 (built-in add -p: implement hunk editing, 2019-12-13).quoted
I think your patch looks good. I did wonder if we wanted to fix this by normalizing context lines instead as shown in the diff below. That might make it less likely to miss adding "|| '\n'" in future code that is looking for a context line but I don't have a strong preference either way.Yeah, I had a similar thought, but it got messy because we have to deal with the source buffer. But the extra "char ch" you added in the patch below fixes that. I think the result is better. Looking at the blank-line handling in recount_edited_hunk(), we also handle a CRLF empty line there. Should we do so here, too? If so, then it would just be a matter of touching normalize_marker() in your patch. Do you want to just re-send your patch with a commit message to replace mine? (Feel free to steal the non-wrong parts of my message ;) ).quoted
---- >8 ----diff --git a/add-patch.c b/add-patch.c index d8ea05ff108..795aa772b7a 100644 --- a/add-patch.c +++ b/add-patch.c@@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk) hunk->splittable_into++; } +/* Empty context lines may omit the leading ' ' */ +static int normalize_marker(char marker) +{ + return marker == '\n' ? ' ' : marker; +} + static int parse_diff(struct add_p_state *s, const struct pathspec *ps)Minor nit: missing blank line between functions. -Peff