Thread (2 messages) 2 messages, 2 authors, 2023-01-29

Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails

From: Mathias Krause <hidden>
Date: 2023-01-29 13:36:41

On 27.01.23 18:39, Junio C Hamano wrote:
Junio C Hamano [off-list ref] writes:
quoted
Yes, the "instead of failing hard, fall back" makes sense.  Just
that I do not see why the runtime test is a good thing to have.  In
short, we are not in the business of catching bugs in pcre2_jit
implementations, so if they say they cannot compile the pattern (I
would even say I doubt the point of checking the return code to
ensure it is NOMEMORY), it would be fine to let the interpreter
codepath to inspect the pattern and diagnose problems with it, or
take the slow match without JIT.

What am I missing?
Note that I've seen and recently re-read the discussion that leads to
https://lore.kernel.org/git/f680b274-fa85-6624-096a-7753a2671c15@grsecurity.net/ (local)
Ahh, so ignore my last advise in the previous Email. You already read it.
I suspect that this auto-probe is related to solving "the user
thinks JIT is in use but because of failing JIT the user's pattern
is getting horrible performance" somehow.  But I do not think a hard
failure is a good approach to help users in such a situation.
Yes, it's exactly trying to detect this case and not cause a regression
for users expecting 'git grep' to make use of the JIT.

So, beside the W|X memory allocation error, other errors are likely only
to point out limitations of the JIT compiler, e.g. the example I gave in
https://lore.kernel.org/all/2b04b19a-a2bd-3dd5-6f21-ed0b0ad3e02f@grsecurity.net/ (local),
which is, admitted, a made up example that can easily be worked around
by manually prefixing it with '(*NO_JIT)'. It would be a pain having to
do that for *every* pattern, but only doing it for the pathological
cases should be fine. Otherwise more users would have run into it
already and complained about it, no?
After such a failure, the user can prefix "(*NO_JIT)" to the pattern
and retry, or give up the operation altogether and not get a useful
result, but wouldn't it be far more helpful to just fallback as if
(*NO_JIT) was on from the beginning?
Sure, if it would be required for *every*, i.e. "normal" patterns. But
always doing it even for abusive ones doesn't feel right.
Also I notice that p->pcre2_jit_on is per "struct grep_pat", so it
is not like "once we see a pathological pattern, we turn off JIT
completely for other patterns", right?  That is, if you have

    git grep -P -e "$A" -e "$B"

and we fail to compile "$A" (for whatever reason), we could still
(attempt to) compile "$B".  Perhaps $A was too complex or was
incompatible with JIT combined with other options, but $B may be
easy enough to still be JITtable, in which case we would match with
the JITted version of $B with interpreted version of $A, instead of
failing, right?
The current version of git would fail hard if it fails to JIT compile
"$A". My patch doesn't change that behavior and that's intentional, as I
share Ævar's thinking about falling back to the interpreter mode for
"complex patterns" (which means pathological cases, really) is a bad
idea. While we might be able to compile the pattern and run it in
interpreter mode, it'll likely have a *much* higher runtime.

Just to get you a glimpse of how much longer, this is what it takes
running the pathological pattern from above's example on git.git:

  $ time git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')"
  Binary file git-gui/macosx/git-gui.icns matches
  Binary file t/t5000/pax.tar matches
  Binary file t/t5004/big-pack.zip matches
  Binary file t/t5004/empty-with-pax-header.tar matches

  real	44m42,150s
  user	577m14,623s
  sys	0m46,210s


So this grep run eat up ~9.5 *hours* of CPU time. Do we really want to
fall back to something like this for the pathological cases? ...Yeah, I
don't think so either.

Thanks,
Mathias
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help