Thread (121 messages) 121 messages, 13 authors, 2021-09-24

Re: [RFC] LKMM: Add volatile_if()

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: 2021-09-24 22:07:46
Also in: linux-toolchains, lkml

----- On Sep 24, 2021, at 4:39 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
----- On Sep 24, 2021, at 3:55 PM, Segher Boessenkool segher@kernel.crashing.org
wrote:
quoted
Hi!

On Fri, Sep 24, 2021 at 02:38:58PM -0400, Mathieu Desnoyers wrote:
quoted
Following the LPC2021 BoF about control dependency, I re-read the kernel
documentation about control dependency, and ended up thinking that what
we have now is utterly fragile.

Considering that the goal here is to prevent the compiler from being able to
optimize a conditional branch into something which lacks the control
dependency, while letting the compiler choose the best conditional
branch in each case, how about the following approach ?

#define ctrl_dep_eval(x)        ({ BUILD_BUG_ON(__builtin_constant_p((_Bool)
x)); x; })
#define ctrl_dep_emit_loop(x)   ({ __label__ l_dummy; l_dummy: asm volatile goto
("" : : : "cc", "memory" : l_dummy); (x); })
#define ctrl_dep_if(x)          if ((ctrl_dep_eval(x) && ctrl_dep_emit_loop(1))
|| ctrl_dep_emit_loop(0))
[The "cc" clobber only pessimises things: the asm doesn't actually
clobber the default condition code register (which is what "cc" means),
and you can have conditional branches using other condition code
registers, or on other registers even (general purpose registers is
common.]
I'm currently considering removing both "memory" and "cc" clobbers from
the asm goto.
quoted
quoted
The idea is to forbid the compiler from considering the two branches as
identical by adding a dummy loop in each branch with an empty asm goto.
Considering that the compiler should not assume anything about the
contents of the asm goto (it's been designed so the generated assembly
can be modified at runtime), then the compiler can hardly know whether
each branch will trigger an infinite loop or not, which should prevent
unwanted optimisations.
The compiler looks if the code is identical, nothing more, nothing less.
There are no extra guarantees.  In principle the compiler could see both
copies are empty asms looping to self, and so consider them equal.
I would expect the compiler not to attempt combining asm goto based on their
similarity because it has been made clear starting from the original
requirements
from the kernel community to the gcc developers that one major use-case of asm
goto involves self-modifying code (patching between nops and jumps).

If this happens to be a real possibility, then we may need to work-around this
for
other uses of asm goto as well.
Now that I page back this stuff into my brain (I last looked at it in details some
12 years ago), I recall that letting compilers combine asm goto statements which
happen to match CSE was actually something we wanted to permit, because we don't care
about editing the nops into jumps for each individual asm goto if they happen
to have the same effect when modified.
If there is indeed a scenario where the compiler can combine similar asm goto
statements,
then I suspect we may want to emit unique dummy code in the assembly which gets
placed in a
discarded section, e.g.:

#define ctrl_dep_emit_loop(x)   ({ __label__ l_dummy; l_dummy: asm goto (
\
               ".pushsection .discard.ctrl_dep\n\t"                            \
               ".long " __stringify(__COUNTER__) "\n\t"                        \
               ".popsection\n\t"                                               \
               "" : : : : l_dummy); (x); })
So I think your point is very much valid: we need some way to make the content of the asm goto
different between the two branches. I think the __COUNTER__ approach is overkill though:
we don't care about making each of the asm goto loop unique within the entire file;
we just don't want them to match between the two legs of the branch.

So something like this should be enough:

#define ctrl_dep_emit_loop(x)   ({ __label__ l_dummy; l_dummy: asm goto (       \
                ".pushsection .discard.ctrl_dep\n\t"                            \
                ".long " __stringify(x) "\n\t"                                  \
                ".popsection\n\t"                                               \
                "" : : : : l_dummy); (x); })

So we emit respectively a 0 and 1 into the discarded section.

Thoughts ?

Thanks,

Mathieu

But then a similar trick would be needed for jump labels as well.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help