Thread (21 messages) 21 messages, 3 authors, 2020-05-08

Re: [PATCH] bpf: Tweak BPF jump table optimizations for objtool compatibility

From: Alexei Starovoitov <hidden>
Date: 2020-05-02 03:06:29
Also in: bpf, lkml

On Fri, May 01, 2020 at 02:56:17PM -0500, Josh Poimboeuf wrote:
On Fri, May 01, 2020 at 12:40:53PM -0700, Alexei Starovoitov wrote:
quoted
On Fri, May 01, 2020 at 02:22:04PM -0500, Josh Poimboeuf wrote:
quoted
On Fri, May 01, 2020 at 12:09:30PM -0700, Alexei Starovoitov wrote:
quoted
On Thu, Apr 30, 2020 at 02:07:43PM -0500, Josh Poimboeuf wrote:
quoted
Objtool decodes instructions and follows all potential code branches
within a function.  But it's not an emulator, so it doesn't track
register values.  For that reason, it usually can't follow
intra-function indirect branches, unless they're using a jump table
which follows a certain format (e.g., GCC switch statement jump tables).

In most cases, the generated code for the BPF jump table looks a lot
like a GCC jump table, so objtool can follow it.  However, with
RETPOLINE=n, GCC keeps the jump table address in a register, and then
does 160+ indirect jumps with it.  When objtool encounters the indirect
jumps, it can't tell which jump table is being used (or even whether
they might be sibling calls instead).

This was fixed before by disabling an optimization in ___bpf_prog_run(),
using the "optimize" function attribute.  However, that attribute is bad
news.  It doesn't append options to the command-line arguments.  Instead
it starts from a blank slate.  And according to recent GCC documentation
it's not recommended for production use.  So revert the previous fix:

  3193c0836f20 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")

With that reverted, solve the original problem in a different way by
getting rid of the "goto select_insn" indirection, and instead just goto
the jump table directly.  This simplifies the code a bit and helps GCC
generate saner code for the jump table branches, at least in the
RETPOLINE=n case.

But, in the RETPOLINE=y case, this simpler code actually causes GCC to
generate far worse code, ballooning the function text size by +40%.  So
leave that code the way it was.  In fact Alexei prefers to leave *all*
the code the way it was, except where needed by objtool.  So even
non-x86 RETPOLINE=n code will continue to have "goto select_insn".

This stuff is crazy voodoo, and far from ideal.  But it works for now.
Eventually, there's a plan to create a compiler plugin for annotating
jump tables.  That will make this a lot less fragile.
I don't like this commit log.
Here you're saying that the code recognized by objtool is sane and good
whereas well optimized gcc code is somehow voodoo and bad.
That is just wrong.
I have no idea what you're talking about.

Are you saying that ballooning the function text size by 40% is well
optimized GCC code?  It seems like a bug to me.  That's the only place I
said anything bad about GCC code.
It could be a bug, but did you benchmark the speed of interpreter ?
Is it faster or slower with 40% more code ?
Did you benchmark it on other archs ?
I thought we were in agreement that 40% text growth is bad.  Isn't that
why you wanted to keep 'goto select_insn' for the retpoline case?
Let me see whether I got this right.
In first the sentence above you're claiming that I've agreed that 
'goto select_insn' is bad for retpoline case and in the second sentence
you're saying that I wanted to keep it because it's bad?
In other words you're saying I wanted bad code for retpoline case for
some mischievous purpose?
Do you really think so or just trolling?

Let's look at the facts.
I've applied your patch and the kernel crashed on the very first test in
selftests/bpf which makes me believe that you only compile tested it.
Taking the question "is 40% text growth is bad?" out of context... Ohh yes.
but if 40% extra code gives 10% speedup to interpreter it's suddenly good, right?
Since you didn't run basic tests I don't think you've tested performance either.
So this direct->indirect patch might cause performance degradation to
architectures that don't have JIT.
On x86-64 JIT=y is the default, so I'm fine taking that performance risk
only for the case where that risk has to be taken. In other words to help
objtool understand the code and only for the case where objtool cannot do
it with existing code.
The 40% potential text decrease after direct->indirect transiton is
irrelevant here. It must be a separate patch after corresponding
performance benchmarking is done.
Just claiming in commit log that current code is obviously bad
is misleading to folks who will be reading it later.

Also as I explained earlier direct->indirect in C is not a contract
for the compiler. Currently there is single C line:
goto *jumptable[insn->code];
but gcc/clang may generate arbitrary number of indirect jumps
for this function.
Changing the macro from "goto select_insn" to "goto *jumptable"
messes with compiler optimizations and there is no guarantee
that the code is going to be better or worse.
Why do you think there are two identical macros there?
#define CONT     ({ insn++; goto select_insn; })
#define CONT_JMP ({ insn++; goto select_insn; })
Why not one?
The answer is in old patch from 2014:
https://patchwork.ozlabs.org/project/netdev/patch/1393910304-4004-2-git-send-email-ast@plumgrid.com/
+#define CONT ({insn++; LOAD_IMM; goto select_insn; })
+#define CONT_JMP ({insn++; LOAD_IMM; goto select_insn; })
+/* some compilers may need help:
+ * #define CONT_JMP ({insn++; LOAD_IMM; goto *jumptable[insn->code]; })
+ */

That was the patch after dozens of performance experiments
with different gcc versions on different cpus.
Six years ago the interpreter performance could be improved
if _one_ of these macros replaced direct with indirect
for certain versions of gcc. But not both macros.

To be honest I don't think you're interested in doing performance
analysis here. You just want to shut up that objtool warning and
move on, right? So please do so without making misleading statements
about goodness or badness of generated code.

Thanks
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help