Re: git grep: ^$ false match at end of file
From: Jeff King <hidden>
Date: 2025-01-10 12:02:24
Subsystem:
the rest · Maintainer:
Linus Torvalds
On Fri, Jan 10, 2025 at 06:43:08AM -0500, Jeff King wrote:
I'll stop digging on it for now (but adding Junio to the cc as the author there). Probably it would have been faster just to start with a debugger than to look through the history. ;)
OK, my curiosity got the better of me. This fixes it:
diff --git a/grep.c b/grep.c
index 4e155ee9e6..9eac3dd95d 100644
--- a/grep.c
+++ b/grep.c@@ -1470,10 +1470,12 @@ static int look_ahead(struct grep_opt *opt, hit = patmatch(p, bol, bol + *left_p, &m, 0); if (hit < 0) return -1; if (!hit || m.rm_so < 0 || m.rm_eo < 0) continue; + if (m.rm_so == *left_p) + continue; /* don't match nothing */ if (earliest < 0 || m.rm_so < earliest) earliest = m.rm_so; } if (earliest < 0) {
but it is weird to me that patmatch() will match "^$" to the end of the buffer at all. It is just calling regexec_buf() behind the scenes, so I guess this is just a weird special case there, and may even depend on the regex implementation. If I pass "-P" to use pcre instead, the problem goes away even without my patch. If we skip look-ahead the problem also goes away. I'd have thought match_line() would have the same problem, but there we process line by line, and regexec_buf() never even sees the newline. So I guess the rationale is: some regexec implementations are weird about this special regex, and we should not trust their result with it on a whole buffer with newlines. -Peff