Thread (3 messages) 3 messages, 2 authors, 2025-01-10

Re: git grep: ^$ false match at end of file

From: Jeff King <hidden>
Date: 2025-01-10 12:02:24
Subsystem: the rest · Maintainer: Linus Torvalds

On Fri, Jan 10, 2025 at 06:43:08AM -0500, Jeff King wrote:
I'll stop digging on it for now (but adding Junio to the cc as the
author there). Probably it would have been faster just to start with a
debugger than to look through the history. ;)
OK, my curiosity got the better of me. This fixes it:
diff --git a/grep.c b/grep.c
index 4e155ee9e6..9eac3dd95d 100644
--- a/grep.c
+++ b/grep.c
@@ -1470,10 +1470,12 @@ static int look_ahead(struct grep_opt *opt,
 		hit = patmatch(p, bol, bol + *left_p, &m, 0);
 		if (hit < 0)
 			return -1;
 		if (!hit || m.rm_so < 0 || m.rm_eo < 0)
 			continue;
+		if (m.rm_so == *left_p)
+			continue; /* don't match nothing */
 		if (earliest < 0 || m.rm_so < earliest)
 			earliest = m.rm_so;
 	}
 
 	if (earliest < 0) {
but it is weird to me that patmatch() will match "^$" to the end of the
buffer at all. It is just calling regexec_buf() behind the scenes, so I
guess this is just a weird special case there, and may even depend on
the regex implementation. If I pass "-P" to use pcre instead, the
problem goes away even without my patch.

If we skip look-ahead the problem also goes away. I'd have thought
match_line() would have the same problem, but there we process line by
line, and regexec_buf() never even sees the newline.

So I guess the rationale is: some regexec implementations are weird
about this special regex, and we should not trust their result with it
on a whole buffer with newlines.

-Peff
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help