Thread (1 message) 1 message, 1 author, 2025-11-10

Re: [PATCH v2 07/12] diff: update the way rewrite diff handles incomplete lines

From: Junio C Hamano <hidden>
Date: 2025-11-10 18:14:07

Patrick Steinhardt [off-list ref] writes:
On Wed, Nov 05, 2025 at 01:30:47PM -0800, Junio C Hamano wrote:
quoted
The diff_symbol based output framework uses one DIFF_SYMBOL_* enum
value per the kind of output lines of "git diff", which corresponds
to one output line from the xdiff machinery used internally.  Most
notably, DIFF_SYMBOL_PLUS and DIFF_SYMBOL_MINUS that correspond to
"+" and "-" lines are designed to always take a complete line, even
"complete line" as in newline-terminated? I only recognized that this is
what you meant in the next paragraph, so it might be useful to clarify
here already what you mean.
Yes, "incomplete line" is a defined term people can look up in
places like POSIX.1 [*] but I do not know of an official word to
refer to the opposite.  Would it work if I rephrase it to say
"...designed to always end in a newline character, even..."?

(https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_172)
quoted
diff --git a/diff.c b/diff.c
index 347cd9c6e9..99298720f4 100644
--- a/diff.c
+++ b/diff.c
@@ -1786,22 +1777,36 @@ static void emit_rewrite_lines(struct emit_callback *ecbdata,
 	const char *endp = NULL;
 
 	while (0 < size) {
...
 	}
-	if (!endp)
-		emit_diff_symbol(ecbdata->opt, DIFF_SYMBOL_NO_LF_EOF, NULL, 0, 0);
+	if (!endp) {
+		static const char nneof[] = "\\ No newline at end of file\n";
+		ecbdata->last_line_kind = prefix;
+		emit_incomplete_line(ecbdata, nneof, sizeof(nneof) - 1);
+	}
 }
Okay. I was wondering at first how this would get executed for both
pre- and postimage if it's not part of the loop anymore. But this is
mostly showing my complete ignorance for the "diff" subsystem, as we end
up calling `emit_rewrite_lines()` itself once for each image.
The idea is to make a "complete rewrite" patch (i.e. what "diff -B"
decides that it is more confusing to express the postimage in terms
of "here are remaining pieces of the preimage, many lines around here
were removed from the preimage and then many new lines are inserted"
than "ok, we are removing everything in the preimage and then we are
replacing them with these lines to form the postimage".

This function is called twice, once to show a bunch of "-removed"
lines for the preimage side, and then again to show a bunch of
"+added" lines for the postimage side.  The loop iterates over these
lines in a single image, and at the end, the last line of the image,
whether it is the preimage or the postimage, may not end in a newline,
in which case we need to append "\ No newline" after it.

I just realize that emit_incomplete_line() may be a misnomer.  It is
not used to show the last line in the pre/postimage that was
incomplete.  The loop gives all lines, even the final incomplete
one, as if each of them ended in a newline.  What the helper
function emit_incomplete_line() does is to show an additional "by
the way, the previous line was an incomplete line" marker after the
contents of the line gets shown.

Perhaps call it emit_incomplete_line_mark() or something, and it
would make it easier to follow what is going on?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help