Thread (4 messages) 4 messages, 2 authors, 2025-10-22

Re: [PATCH] match_pathname(): give fnmatch one char of prefix context

From: Sruteesh Kumar <hidden>
Date: 2025-10-22 16:19:57

Although this feels like a hack, it fixes the issue without impact much of the code. Otherwise, we would get regression issues. Someone from git community, please check this once.




Sent with Proton Mail secure email.

On Tuesday, October 14th, 2025 at 6:04 AM, Jeff King [off-list ref] wrote:
quoted hunk ↗ jump to hunk
On Fri, Oct 10, 2025 at 02:57:07PM +0000, Sruteesh Kumar wrote:
quoted
Look at the first scenario in the above link. Git is matching the path
foobar with the pattern foo**/bar which is against the git's official
documentation (Look at the last point in the double asterisk section
at the URL https://git-scm.com/docs/gitignore#_pattern_format).

Is this an issue with the code or the documentation?

I think the code is buggy. Here's a patch which would fix it, but I've
marked it as RFC because:

1. I haven't entirely convinced myself that there aren't more
complicated variants of the same problem.

2. It's kind of a disgusting hack.

-- >8 --

Subject: match_pathname(): give fnmatch one char of prefix context

In match_pathname(), which we use for matching .gitignore and
.gitattribute patterns, we are comparing paths with with fnmatch
patterns (actually our extended wildmatch, which will be important).
There's an extra optimization there: we pre-compute the number of
non-wildcard characters at the beginning of the pattern and do an
fspathncmp() on that prefix.

That lets us avoid fnmatch entirely on patterns without wildcards, and
shrinks the amount of work we hand off to fnmatch. For a pattern like
"foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
prefix and just pass "*.txt" and "bar.txt" to fnmatch().

But this misses a subtle corner case. In fnmatch(), we'll think
"bar.txt" is the start of the path, but it's not. This doesn't matter
for the pattern above, but consider the wildmatch pattern "foo**/bar"
and the path "foobar". These two should not match, because there is no
file named "bar", and the "" applies only to the containing directory
name. But after removing the "foo" prefix, fnmatch will get "/bar" and
"bar", which it does consider a match, because "/" can match zero
directories.

We can solve this by giving fnmatch a bit more context. As long as it
has one byte of the matched prefix, then it will know that "bar" is not
the start of the path. In this example it would get "o/bar" and
"obar", and realize that they cannot match.

In the case that there are no wildcards at all (i.e., the whole prefix
matches), we'll continue to return without running fnmatch at all. We
just need to account for the extra byte in our adjusted lengths.

Signed-off-by: Jeff King peff@peff.net

---
I wonder how much this prefix-matching buys us in practice. There are
two cases that are helped:

1. When there is no wildcard in the pattern at all, we skip fnmatch
entirely.

2. We do a raw match of the prefix chars, shrinking the size of what
is passed to fnmatch.

My suspicion is that most of the improvement comes from (1), and it
would be very easy to retain that case and get rid of (2). But I haven't
done any measuring.

dir.c | 9 ++++++++-
t/t0008-ignores.sh | 11 +++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/dir.c b/dir.c
index 0a67a99cb3..764400d9c5 100644
--- a/dir.c
+++ b/dir.c
@@ -1360,6 +1360,13 @@ int match_pathname(const char pathname, int pathlen,
if (fspathncmp(pattern, name, prefix))
return 0;
+
+ /
+ * Retain one character of the prefix to
+ * pass to fnmatch, which lets it distinguish
+ * the start of a directory component correctly.
+ */
+ prefix--;
pattern += prefix;
patternlen -= prefix;
name += prefix;
@@ -1370,7 +1377,7 @@ int match_pathname(const char pathname, int pathlen,
* then our prefix match is all we need; we
* do not need to call fnmatch at all.
/
- if (!patternlen && !namelen)
+ if (patternlen == 1 && namelen == 1)
return 1;
}
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 273d71411f..db8bde280e 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -847,6 +847,17 @@ test_expect_success 'directories and ** matches' '
test_cmp expect actual
'

+test_expect_success ' not confused by matching leading prefix' '
+ cat >.gitignore <<-\EOF &&

+ foo**/bar
+ EOF
+ git check-ignore foobar foo/bar >actual &&

+ cat >expect <<-\EOF &&

+ foo/bar
+ EOF
+ test_cmp expect actual
+'
+
############################################################################
#
# test whitespace handling
--
2.51.0.754.gd4f5ded95f
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help