Re: [PATCH v2 07/12] diff: update the way rewrite diff handles incomplete lines
From: Junio C Hamano <hidden>
Date: 2025-11-10 18:14:07
Patrick Steinhardt [off-list ref] writes:
On Wed, Nov 05, 2025 at 01:30:47PM -0800, Junio C Hamano wrote:quoted
The diff_symbol based output framework uses one DIFF_SYMBOL_* enum value per the kind of output lines of "git diff", which corresponds to one output line from the xdiff machinery used internally. Most notably, DIFF_SYMBOL_PLUS and DIFF_SYMBOL_MINUS that correspond to "+" and "-" lines are designed to always take a complete line, even"complete line" as in newline-terminated? I only recognized that this is what you meant in the next paragraph, so it might be useful to clarify here already what you mean.
Yes, "incomplete line" is a defined term people can look up in places like POSIX.1 [*] but I do not know of an official word to refer to the opposite. Would it work if I rephrase it to say "...designed to always end in a newline character, even..."? (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_172)
quoted
diff --git a/diff.c b/diff.c index 347cd9c6e9..99298720f4 100644 --- a/diff.c +++ b/diff.c@@ -1786,22 +1777,36 @@ static void emit_rewrite_lines(struct emit_callback *ecbdata, const char *endp = NULL; while (0 < size) {... } - if (!endp) - emit_diff_symbol(ecbdata->opt, DIFF_SYMBOL_NO_LF_EOF, NULL, 0, 0); + if (!endp) { + static const char nneof[] = "\\ No newline at end of file\n"; + ecbdata->last_line_kind = prefix; + emit_incomplete_line(ecbdata, nneof, sizeof(nneof) - 1); + } }Okay. I was wondering at first how this would get executed for both pre- and postimage if it's not part of the loop anymore. But this is mostly showing my complete ignorance for the "diff" subsystem, as we end up calling `emit_rewrite_lines()` itself once for each image.
The idea is to make a "complete rewrite" patch (i.e. what "diff -B" decides that it is more confusing to express the postimage in terms of "here are remaining pieces of the preimage, many lines around here were removed from the preimage and then many new lines are inserted" than "ok, we are removing everything in the preimage and then we are replacing them with these lines to form the postimage". This function is called twice, once to show a bunch of "-removed" lines for the preimage side, and then again to show a bunch of "+added" lines for the postimage side. The loop iterates over these lines in a single image, and at the end, the last line of the image, whether it is the preimage or the postimage, may not end in a newline, in which case we need to append "\ No newline" after it. I just realize that emit_incomplete_line() may be a misnomer. It is not used to show the last line in the pre/postimage that was incomplete. The loop gives all lines, even the final incomplete one, as if each of them ended in a newline. What the helper function emit_incomplete_line() does is to show an additional "by the way, the previous line was an incomplete line" marker after the contents of the line gets shown. Perhaps call it emit_incomplete_line_mark() or something, and it would make it easier to follow what is going on?