Thread (51 messages) 51 messages, 8 authors, 2012-03-09

[PATCH-WIP 01/13] xen/arm: use r12 to pass the hypercall number to the hypervisor

From: Dave Martin <hidden>
Date: 2012-03-09 15:58:07
Also in: kvm, lkml, xen-devel

On Thu, Mar 8, 2012 at 6:47 PM, Richard Earnshaw
[off-list ref] wrote:
On 08/03/12 17:21, Nicolas Pitre wrote:
quoted
On Thu, 8 Mar 2012, Richard Earnshaw wrote:
quoted
On 02/03/12 21:15, Nicolas Pitre wrote:
quoted
So, to me, the gcc documentation is perfectly clear on this topic.
there really _is_ a guarantee that those asm marked variables will be in
the expected registers on entry to the inline asm, given that the
variable is _also_ listed as an operand to the asm statement. ?But only
in that case.

It is true that gcc may reorder other function calls or other code
around the inline asm and then intervening code can clobber any
registers. ?Then it is up to gcc to preserve the variable's content
elsewhere when its register is used for other purposes, and restore it
when some inline asm statement is referring to it.

And if gcc does not do this then it is buggy. ?Version 3.4.0 of gcc was
buggy. ?No other gcc versions in the last 7 years had such a problem or
the __asmeq macro in the kernel would have told us.
quoted
Or, to summarise another way, there is no way to control which register
is used to pass something to an inline asm in general (often we get away
with this, and there are a lot of inline asms in the kernel that assume
it works, but the more you inline the more likely you are to get nasty
surprises).
This statement is therefore unfounded and wrong. ?Please direct the
tools guy who mislead you to the above gcc documentation.
The problem is not really about re-ordering functions but about implicit
functions that come from the source code; for example

int foo (int a, int b)
{
? register int x __asm__("r0") = 33;

? register int c __asm__("r1") = a / b; /* Ooops, clobbers r0 with
division function call. ?*/

? asm ("svc 0" : : "r" (x));
}

There's nothing in the specification to say what happens if there's a
statement in the code that causes an implicit clobber of your assembly
register.
I'm sure gcc is full of implicit behaviors that are not mentioned in
the specification. ?But as long as the specification is respected, then
there is no need to mention any unobservable side effects from a program
flow point of view, right?

Why wouldn't gcc be able to respect the documented feature by
preventing live variable from being clobbered and reloading them in
the specified register at the inline asm entry point, just like it does
for function calls?

Here's an example code that shows that, unfortunately, gcc is still
broken with regards to the documented behavior:

extern int bar(int);
int foo(int y)
{
? ? ? ? register int x __asm__("r1") = 33;
? ? ? ? y += bar(x);
? ? ? ? asm ("@ x should be live in %0 here" : "+r" (x) : "r" (y));
? ? ? ? y += bar(x);
? ? ? ? asm ("@ x should be live in %0 here" : "+r" (x) : "r" (y));
? ? ? ? return x;
}

Result is:

foo:
? ? ? ? stmfd ? sp!, {r4, lr}
? ? ? ? mov ? ? r4, r0
? ? ? ? mov ? ? r0, #33
? ? ? ? bl ? ? ?bar
? ? ? ? add ? ? r4, r0, r4
? ? ? ? @ x should be live in r1 here
? ? ? ? mov ? ? r0, r1
? ? ? ? bl ? ? ?bar
? ? ? ? add ? ? r0, r0, r4
? ? ? ? @ x should be live in r1 here
? ? ? ? mov ? ? r0, r1
? ? ? ? ldmfd ? sp!, {r4, lr}
? ? ? ? bx ? ? ?lr

To me this is clearly a bug if gcc is not able to meet the documented
expectation. ?And the documented expectation is not at all unreasonable.
No, in this case it is presumed that /you/ know that calling bar() will
modify x. ?Thus the code is either well defined (if you know what is in
r1 after each call to bar), or undefined (if you can't say anything
about r1 after each call).
It could be argued that since the set of registers involved in the PCS
are well-known, then if the programmer assigns a variable to one of
those registers, then that is a conscious aliasing of the variable
with a global register which can be destroyed at any time as a
consequence of the ABI.  Because there are few guarantees about how
the compiler will or won't transform the code, this suggessts that
asm("rX") annotations can't work reliably for r0-r3 or r12 with the
ARM PCS.

Indeed, the GCC docs do in fact have this to say:

    "register int *p1 asm ("r0") = ...;
    register int *p2 asm ("r1") = ...;
    register int *result asm ("r0");
    asm ( [...] );

[...] beware that a register that is call-clobbered by the target ABI
will be overwritten by any function call in the assignment including
library calls for arithmetic operators.  Also a register may be
clobbered when generating some operations, like variable shift, memory
copy or memory move on x86.  Assuming it is a call-clobbered register,
this may happen to `r0' above by the assignment to `p2'.  Ig you have
to use such a register, use temporary variables for expressions
between the register assignment and use:

    int t1 = ...;
    register int *p1 asm("r0") = ...;
    register int *p2 asm("r1") = t1;
    register int *result asm("r0");
    asm ( [...] )"

But this is at least somewhat in conflict with "The compiler's data
flow analysis is capable of determining where the specified registers
contain live values, and where they are available for other uses."

It also seems to assume -O0 type behaviour where the compiler is doing
a straightforward sequential translation of the code.  Why it is
guaranteed that the assignment to p2 now certainly does not clobber p1
(even as a side effect), what the implied aliasing of result with p1
actually guarantees (or whether the compiler really understands it at
all); or what constraints there are on the compiler reordering or
inserting random extraneous code into the above, I have no idea.  Such
assumptions don't feel very safe in the presence of optimisation.

In other words, all sorts of undocumented guarantees beyond the C
language are needed for it even to be possible to interpret what the
above code examples should mean in the first place.

The documentation leaves a lot of questions unanswered, but it does at
least suggest that other arches have the same kind of potential
pitfalls that we observed on ARM.


Register variables feel like a red herring though.  We're only using
those because we can't do the needful thing and actually desscribe
these constraints in the asm constraints (which would seem to be the
right place).  We specifically don't care where those values are
except at the boundaries of the asm block itself.

Is there a reason why ARM gcc doesn't provide the ability to specify
such exact-register constraints, or is this more for historical
reasons?  It is possible?

Cheers
---Dave
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help