Thread (2 messages) 2 messages, 2 authors, 2023-01-29

Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails

From: Mathias Krause <hidden>
Date: 2023-01-29 12:28:46

On 27.01.23 17:34, Junio C Hamano wrote:
Mathias Krause [off-list ref] writes:
quoted
As having a functional PCRE2 JIT compiler is a legitimate use case for
performance reasons, we'll only do the fallback if the supposedly
available JIT is found to be non-functional by attempting to JIT compile
a very simple pattern. If this fails, JIT is deemed to be non-functional
and we do the interpreter fallback. For all other cases, i.e. the simple
pattern can be compiled but the user provided cannot, we fail hard as we
do now as the reason for the failure must be the pattern itself.
I do not know if it is a good idea to rely on the "very simple
pattern".  The implementation of JIT could devise various ways to
succeed for such simple patterns without having writable-executable
piece of memory.
Well, if PCRE2 JIT ever changes to optimize this case, we would be back
to the error I'm seeing right now. But I doubt that PCRE2 will be doing
optimizations like that. The current implementation does the JIT memory
allocation test very early, even before looking at the pattern:

https://github.com/PCRE2Project/pcre2/blob/pcre2-10.42/src/pcre2_jit_compile.c#L14450-L14465

But I can add a call to pcre2_pattern_info(PCRE2_INFO_JITSIZE) if you
really like me to, but IMHO it's not needed.
                  What happened to the earlier idea of falling back
to the interpreted codepath, which will catch any bad pattern that
has "the reason for the failure" by failing anyway?
Ævar's concerns about always falling back to the interpreter mode made
me change the patch like this. Basically what he's concerned about are
two things:
1/ "Crazy patterns" that fail the JIT but will work for the interpreter
can be a serve performance regression.
2/ Always falling back to interpreter mode might mask JIT API usage
errors, we'd like to see.

While 1/ could also be seen as a limitation of current 'git grep', I
share Ævar's extended runtime regression concerns. If, for example, some
web interface offers users to supply arbitrary grep patterns, abusing
the interpreter mode fallback will consume significant more CPU
resources than it does right now (which simply fails with an error).
quoted
+static int pcre2_jit_functional(void)
+{
+	static int jit_working = -1;
+	pcre2_code *code;
+	size_t off;
+	int err;
+
+	if (jit_working != -1)
+		return jit_working;
+
+	/*
+	 * Try to JIT compile a simple pattern to probe if the JIT is
+	 * working in general. It might fail for systems where creating
+	 * memory mappings for runtime code generation is restricted.
+	 */
+	code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL);
+	if (!code)
+		return 0;
+
+	jit_working = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE) == 0;
+	pcre2_code_free(code);
I'd prefer not having to worry about: Or it might not fail for such
systems, as the pattern is too simple and future versions of
pcre2_compile() could have special case code.
See above, it's unlikely to happen.
quoted
@@ -317,8 +342,23 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on);
 	if (p->pcre2_jit_on) {
 		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
-		if (jitret)
+		if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) {
+			/*
+			 * Even though pcre2_config(PCRE2_CONFIG_JIT, ...)
+			 * indicated JIT support, the library might still
+			 * fail to generate JIT code for various reasons,
+			 * e.g. when SELinux's 'deny_execmem' or PaX's
+			 * MPROTECT prevent creating W|X memory mappings.
+			 *
+			 * Instead of faling hard, fall back to interpreter
+			 * mode, just as if the pattern was prefixed with
+			 * '(*NO_JIT)'.
+			 */
+			p->pcre2_jit_on = 0;
+			return;
Yes, the "instead of failing hard, fall back" makes sense.  Just
that I do not see why the runtime test is a good thing to have.
It prevents the fallback from being abused and introducing new
regressions. So it's good to have.
                                                                 In
short, we are not in the business of catching bugs in pcre2_jit
implementations, so if they say they cannot compile the pattern (I
would even say I doubt the point of checking the return code to
ensure it is NOMEMORY), it would be fine to let the interpreter
codepath to inspect the pattern and diagnose problems with it, or
take the slow match without JIT.
Yeah, unfortunately they're not gonna fix what's a bug, IMHO. They think
it's a feature: https://github.com/PCRE2Project/pcre2/pull/157

Anyhow, the error code is very well documented, see pcre2_jit_compile(3)
"""
          [...]  The  function can also return PCRE2_ERROR_NOMEMORY if
       JIT is unable to allocate executable memory for  the  compiler,
       even if it was because of a system security restriction.
"""

And that's very much in line with what the test in pcre2_jit_compile()'s
current implementation does.
What am I missing?
Please have a look at Ævar's reasoning here:
https://lore.kernel.org/git/221220.86bknxwy9t.gmgdl@evledraar.gmail.com/ (local)

Thanks,
Mathias
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help