On Sun, Jun 25, 2017 at 12:14:04PM +0100, Al Viro wrote:
On Sun, Jun 25, 2017 at 10:53:58AM +0100, Al Viro wrote:
quoted
On Sat, Jun 24, 2017 at 12:29:23PM -0500, Larry Finger wrote:
quoted
I made a break through. If I turn off inline copy to/from users for 32-bit
ppc with the following patch, then the system boots:
OK... So it's 4.6.3 miscompiling something - it is hardware-independent,
reproduced in qemu. I'd like to get more self-contained example of
miscompile, though; should be done by tonight...
OK, it's the call in rw_copy_check_uvector(); with INLINE_COPY_FROM_USER
it's miscompiled by 4.6.3. I hadn't looked through the generated code
yet; will do that after I grab some sleep.
Confirmed. It manages to bugger the loop immediately after the (successful)
copying of iovec array in rw_copy_check_uvector(); both with and without
INLINE_COPY_FROM_USER it has (just before the call of copy_from_user()) r27
set to nr_segs * sizeof(struct iovec). The call is made, we check that it
has succeeded and that's when it hits the fan: without INLINE_COPY_FROM_USER
we have (interleaved with unrelated insns)
addi 27,27,-8
srwi 27,27,3
addi 27,27,1
mtctr 27
Weird, but manages to pass nr_segs to mtctr. _With_ INLINE_COPY_FROM_USER we
get this:
lis 9,0x2000
mtctr 9
In other words, the loop will try to go through 8192 iterations. No idea where
that number has come from, but it sure as hell is wrong. That's where those
-EINVAL, etc. are coming from - we run into something negative in iov[seg].len,
after having run out of on-stack iovec array.
Assembler generated out of rw_copy_check_uvector() with and without
INLINE_COPY_FROM_USER is attached; it's a definite miscompile. Neither 4.4.5
nor 6.3.0 use mtctr/bdnz for that loop.
The bottom line is, ppc cross-toolchain on kernel.org happens to be
the version that miscompiles rw_copy_check_uvector() with INLINE_COPY_FROM_USER
and hell knows what else. Said that, I would rather have ppc32 drop the
INLINE_COPY_{TO,FROM}_USER anyway; that won't fix any other places where
the same 4.6.3 bug hits, but I seriously suspect that it will end up being
faster even on non^Wless buggy gcc versions. Could powerpc folks check
what does removing those two defines from arch/powerpc/include/asm/uaccess.h
do to performance? If there's no slowdown, I would strongly recommend just
removing those as in the patch Larry has posted upthread.
Fixing whatever it is in gcc 4.6.3 that triggers that behaviour is
IMO pointless - it might make sense to switch kernel.org cross-toolchain to
something more recent, but that's it.