Thread (16 messages) 16 messages, 6 authors, 2020-10-30

Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE

From: Ard Biesheuvel <ardb@kernel.org>
Date: 2020-10-28 23:24:05
Also in: bpf, lkml

On Wed, 28 Oct 2020 at 23:59, Alexei Starovoitov
[off-list ref] wrote:
On Wed, Oct 28, 2020 at 11:15:04PM +0100, Ard Biesheuvel wrote:
quoted
On Wed, 28 Oct 2020 at 22:39, Alexei Starovoitov
[off-list ref] wrote:
quoted
On Wed, Oct 28, 2020 at 06:15:05PM +0100, Ard Biesheuvel wrote:
quoted
Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
function scope __attribute__((optimize("-fno-gcse"))), to disable a
GCC specific optimization that was causing trouble on x86 builds, and
was not expected to have any positive effect in the first place.

However, as the GCC manual documents, __attribute__((optimize))
is not for production use, and results in all other optimization
options to be forgotten for the function in question. This can
cause all kinds of trouble, but in one particular reported case,
it causes -fno-asynchronous-unwind-tables to be disregarded,
resulting in .eh_frame info to be emitted for the function.

This reverts commit 3193c0836, and instead, it disables the -fgcse
optimization for the entire source file, but only when building for
X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
original commit states that CONFIG_RETPOLINE=n triggers the issue,
whereas CONFIG_RETPOLINE=y performs better without the optimization,
so it is kept disabled in both cases.

Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/ (local)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/compiler-gcc.h   | 2 --
 include/linux/compiler_types.h | 4 ----
 kernel/bpf/Makefile            | 6 +++++-
 kernel/bpf/core.c              | 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index d1e3c6896b71..5deb37024574 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -175,5 +175,3 @@
 #else
 #define __diag_GCC_8(s)
 #endif
-
-#define __no_fgcse __attribute__((optimize("-fno-gcse")))
See my reply in the other thread.
I prefer
-#define __no_fgcse __attribute__((optimize("-fno-gcse")))
+#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))

Potentially with -fno-asynchronous-unwind-tables.
So how would that work? arm64 has the following:

KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables

ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
KBUILD_CFLAGS += -ffixed-x18
endif

and it adds -fpatchable-function-entry=2 for compilers that support
it, but only when CONFIG_FTRACE is enabled.
I think you're assuming that GCC drops all flags when it sees __attribute__((optimize)).
That's not the case.
So which flags does it drop, and which doesn't it drop? Is that
documented somewhere? Is that the same for all versions of GCC?
quoted
Also, as Nick pointed out, -fno-gcse does not work on Clang.
yes and what's the point?
#define __no_fgcse is GCC only. clang doesn't need this workaround.
Ah ok, that's at least something.
quoted
Every architecture will have a different set of requirements here. And
there is no way of knowing which -f options are disregarded when you
use the function attribute.

So how on earth are you going to #define __no-fgcse correctly for
every configuration imaginable?
quoted
__attribute__((optimize("")) is not as broken as you're claiming to be.
It has quirky gcc internal logic, but it's still widely used
in many software projects.
So it's fine because it is only a little bit broken? I'm sorry, but
that makes no sense whatsoever.

If you insist on sticking with this broken construct, can you please
make it GCC/x86-only at least?
I'm totally fine with making
#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
to be gcc+x86 only.
I'd like to get rid of it, but objtool is not smart enough to understand
generated asm without it.
I'll defer to the x86 folks to make the final call here, but I would
be perfectly happy doing

index d1e3c6896b71..68ddb91fbcc6 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -176,4 +176,6 @@
 #define __diag_GCC_8(s)
 #endif

+#ifdef CONFIG_X86
 #define __no_fgcse __attribute__((optimize("-fno-gcse")))
+#endif
and end the conversation here, because I honestly cannot wrap my head
around the fact that you are willing to work around an x86 specific
objtool shortcoming by arbitrarily disabling some GCC optimization for
all architectures, using a construct that may or may not affect other
compiler settings in unpredictable ways, where the compiler is being
used to compile a BPF language runtime for executing BPF programs
inside the kernel.

What on earth could go wrong?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help