Re: [PATCH v7 04/25] arm64: Substitute gettimeofday with C implementation | linux-arm-kernel

quoted

On Thu, Jun 27, 2019 at 11:57:36AM +0100, Vincenzo Frascino wrote:
Hi Dave,

Overall, I want to thank you for bringing out the topic. It helped me to
question some decisions and make sure that we have no holes left in
the approach.
Fair enough.

This is really just a nasty compiler corner-case... the validity of the
overall approach isn't affected.

vDSO library is a shared object not compiled with LTO as far as I can
see, hence if this involved LTO should not applicable in this case.
That turned to be a spurious hypothesis on my part -- LTO isn't the
smoking gun.  (See below.)
Ok.

The classic example of this (triggered directly and not due to inlining)
would be something like:

int bar(int, int);

void foo(int x, int y)
{
	register int x_ asm("r0") = x;
	register int y_ asm("r1") = bar(x, y);

	asm volatile (
		"svc	#0"
		:: "r" (x_), "r" (y_)
		: "memory"
	);
}

->

0000000000000000 <foo>:
   0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
   4:   910003fd        mov     x29, sp
   8:   94000000        bl      0 <bar>
   c:   2a0003e1        mov     w1, w0
  10:   d4000001        svc     #0x0
  14:   a8c17bfd        ldp     x29, x30, [sp], #16
  18:   d65f03c0        ret
Contextualized to what my vdso fallback functions do, this should not be a
concern because in no case a function result is directly set to a variable
declared as register.

Since the vdso fallback functions serve a very specific and limited purpose, I
do not expect that that code is going to change much in future.

The only thing that can happen is something similar to what I wrote in my
example, which as I empirically proved does not trigger the problematic behavior.

The gcc documentation is vague and ambiguous about precisely whan this
can happen and about how to avoid it.
On this I agree, it is not very clear, but this seems more something to raise
with the gcc folks in order to have a more "explicit" description that leaves no
room to the interpretation.

...

However, the workaround is cheap, and to avoid the chance of subtle
intermittent code gen bugs it may be worth it:

void foo(int x, int y)
{
	asm volatile (
		"mov	x0, %0\n\t"
		"mov	x1, %1\n\t"
		"svc	#0"
		:: "r" (x), "r" (bar(x, y))
		: "r0", "r1", "memory"
	);
}

->

0000000000000000 <foo>:
   0:   a9be7bfd        stp     x29, x30, [sp, #-32]!
   4:   910003fd        mov     x29, sp
   8:   f9000bf3        str     x19, [sp, #16]
   c:   2a0003f3        mov     w19, w0
  10:   94000000        bl      0 <bar>
  14:   2a0003e2        mov     w2, w0
  18:   aa1303e0        mov     x0, x19
  1c:   aa0203e1        mov     x1, x2
  20:   d4000001        svc     #0x0
  24:   f9400bf3        ldr     x19, [sp, #16]
  28:   a8c27bfd        ldp     x29, x30, [sp], #32
  2c:   d65f03c0        ret

What do you think?
The solution seems ok, thanks for providing it, but IMHO I think we
should find a workaround for something that is broken, which, unless
I am missing something major, this seems not the case.
So, after a bit of further experimentation, I found that I could trigger
it with implicit function calls on an older compiler.  I couldn't show
it with explicit function calls (as in your example).

With the following code, inlining if an expression that causes an
implicit call to a libgcc helper can trigger this issue, but I had to
try an older compiler:

int foo(int x, int y)
{
	register int res asm("r0");
	register const int x_ asm("r0") = x;
	register const int y_ asm("r1") = y;

	asm volatile (
		"svc	#0"
		: "=r" (res)
		: "r" (x_), "r" (y_)
		: "memory"
	);

	return res;
}

int bar(int x, int y)
{
	return foo(x, x / y);
}

-> (arm-linux-gnueabihf-gcc 9.1 -O2)

00000000 <foo>:
   0:   df00            svc     0
   2:   4770            bx      lr

00000004 <bar>:
   4:   b510            push    {r4, lr}
   6:   4604            mov     r4, r0
   8:   f7ff fffe       bl      0 <__aeabi_idiv>
   c:   4601            mov     r1, r0
   e:   4620            mov     r0, r4
  10:   df00            svc     0
  12:   bd10            pop     {r4, pc}

-> (arm-linux-gnueabihf-gcc 5.1 -O2)

00000000 <foo>:
   0:   df00            svc     0
   2:   4770            bx      lr

00000004 <bar>:
   4:   b508            push    {r3, lr}
   6:   f7ff fffe       bl      0 <__aeabi_idiv>
   a:   4601            mov     r1, r0
   c:   df00            svc     0
   e:   bd08            pop     {r3, pc}
Thanks for reporting this. I had a go with gcc-5.1 on the vDSO library and seems
Ok, but it was worth trying.

For obvious reasons I am not reporting the objdump here :)

I was struggling to find a way to emit an implicit function call for
AArch64, except for 128-bit divide, which would complicate things since
uint128_t doesn't fit in a single register anyway.

Maybe this was considered a bug and fixed sometime after GCC 5, but I
think the GCC documentation is still quite unclear on the semantics of
register asm vars that alias call-clobbered registers in the PCS.

If we can get a promise out of the GCC folks that this will not happen
with any future compiler, then maybe we could just require a new enough
compiler to be used.
On this I fully agree, the compiler should never change an "expected" behavior.

If the issue comes from a gray area in the documentation, we have to address it
and have it fixed there.

The minimum version of the compiler from linux-4.19 is 4.6, hence I had to try
that the vDSO lib does not break with 5.1 [1].

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cafa0010cd51fb711fdcb50fc55f394c5f167a0a
OK

Then of course there is clang.
I could not help myself and I tried clang.8 and clang.7 as well with my example,
just to make sure that we are fine even in that case. Please find below the
results (pretty identical).

main.clang.7.o:	file format ELF64-aarch64-little

Disassembly of section .text:
0000000000000000 show_it:
       0:	e8 03 1f aa 	mov	x8, xzr
       4:	09 68 68 38 	ldrb	w9, [x0, x8]
       8:	08 05 00 91 	add	x8, x8, #1
       c:	c9 ff ff 34 	cbz	w9, #-8 <show_it+0x4>
      10:	02 05 00 51 	sub	w2, w8, #1
      14:	e1 03 00 aa 	mov	x1, x0
      18:	08 08 80 d2 	mov	x8, #64
      1c:	01 00 00 d4 	svc	#0
      20:	c0 03 5f d6 	ret

main.clang.8.o:	file format ELF64-aarch64-little

Disassembly of section .text:
0000000000000000 show_it:
       0:	e8 03 1f aa 	mov	x8, xzr
       4:	09 68 68 38 	ldrb	w9, [x0, x8]
       8:	08 05 00 91 	add	x8, x8, #1
       c:	c9 ff ff 34 	cbz	w9, #-8 <show_it+0x4>
      10:	02 05 00 51 	sub	w2, w8, #1
      14:	e1 03 00 aa 	mov	x1, x0
      18:	08 08 80 d2 	mov	x8, #64
      1c:	01 00 00 d4 	svc	#0
      20:	c0 03 5f d6 	ret

Commands used:

$ clang -target aarch64-linux-gnueabi main.c -O -c -o main.clang.<x>.o
$ llvm-objdump -d main.clang.<x>.o
Actually, I'm not sure this is comparable with the reproducer I quoted
in my last reply.

The compiler can see the definition of strlen and fully inlines it.
I only ever saw the problem when the compiler emits an out-of-line
implicit function call.

What does clang do with my example on 32-bit?

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help