Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails
From: Mathias Krause <hidden>
Date: 2023-01-29 13:36:41
On 27.01.23 18:39, Junio C Hamano wrote:
Junio C Hamano [off-list ref] writes:quoted
Yes, the "instead of failing hard, fall back" makes sense. Just that I do not see why the runtime test is a good thing to have. In short, we are not in the business of catching bugs in pcre2_jit implementations, so if they say they cannot compile the pattern (I would even say I doubt the point of checking the return code to ensure it is NOMEMORY), it would be fine to let the interpreter codepath to inspect the pattern and diagnose problems with it, or take the slow match without JIT. What am I missing?Note that I've seen and recently re-read the discussion that leads to https://lore.kernel.org/git/f680b274-fa85-6624-096a-7753a2671c15@grsecurity.net/ (local)
Ahh, so ignore my last advise in the previous Email. You already read it.
I suspect that this auto-probe is related to solving "the user thinks JIT is in use but because of failing JIT the user's pattern is getting horrible performance" somehow. But I do not think a hard failure is a good approach to help users in such a situation.
Yes, it's exactly trying to detect this case and not cause a regression for users expecting 'git grep' to make use of the JIT. So, beside the W|X memory allocation error, other errors are likely only to point out limitations of the JIT compiler, e.g. the example I gave in https://lore.kernel.org/all/2b04b19a-a2bd-3dd5-6f21-ed0b0ad3e02f@grsecurity.net/ (local), which is, admitted, a made up example that can easily be worked around by manually prefixing it with '(*NO_JIT)'. It would be a pain having to do that for *every* pattern, but only doing it for the pathological cases should be fine. Otherwise more users would have run into it already and complained about it, no?
After such a failure, the user can prefix "(*NO_JIT)" to the pattern and retry, or give up the operation altogether and not get a useful result, but wouldn't it be far more helpful to just fallback as if (*NO_JIT) was on from the beginning?
Sure, if it would be required for *every*, i.e. "normal" patterns. But always doing it even for abusive ones doesn't feel right.
Also I notice that p->pcre2_jit_on is per "struct grep_pat", so it
is not like "once we see a pathological pattern, we turn off JIT
completely for other patterns", right? That is, if you have
git grep -P -e "$A" -e "$B"
and we fail to compile "$A" (for whatever reason), we could still
(attempt to) compile "$B". Perhaps $A was too complex or was
incompatible with JIT combined with other options, but $B may be
easy enough to still be JITtable, in which case we would match with
the JITted version of $B with interpreted version of $A, instead of
failing, right?The current version of git would fail hard if it fails to JIT compile "$A". My patch doesn't change that behavior and that's intentional, as I share Ævar's thinking about falling back to the interpreter mode for "complex patterns" (which means pathological cases, really) is a bad idea. While we might be able to compile the pattern and run it in interpreter mode, it'll likely have a *much* higher runtime. Just to get you a glimpse of how much longer, this is what it takes running the pathological pattern from above's example on git.git: $ time git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" Binary file git-gui/macosx/git-gui.icns matches Binary file t/t5000/pax.tar matches Binary file t/t5004/big-pack.zip matches Binary file t/t5004/empty-with-pax-header.tar matches real 44m42,150s user 577m14,623s sys 0m46,210s So this grep run eat up ~9.5 *hours* of CPU time. Do we really want to fall back to something like this for the pathological cases? ...Yeah, I don't think so either. Thanks, Mathias