Thread (22 messages) 22 messages, 5 authors, 2021-11-17

Re: [PATCH v3 7/7] ARM: implement support for vmap'ed stacks

From: "Russell King (Oracle)" <linux@armlinux.org.uk>
Date: 2021-11-16 20:08:34
Also in: kernelci, linux-omap

On Tue, Nov 16, 2021 at 08:28:02PM +0100, Ard Biesheuvel wrote:
(+ Tony and linux-omap@)

On Tue, 16 Nov 2021 at 10:23, Guillaume Tucker
[off-list ref] wrote:
quoted
Hi Ard,

Please see the bisection report below about a boot failure on
omap4-panda which is pointing to this patch.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

Some more details can be found here:

  https://linux.kernelci.org/test/case/id/6191b1b97c175a5ade335948/

It seems like the kernel just froze after about 3 seconds without
any obvious errors in the log.

Please let us know if you need any help debugging this issue or
if you have a fix to try.
Thanks for the report.

I wonder if this might be related to low level platform code running
off a different stack (maybe in SRAM?) when an interrupt is taken? Or
using a different set of page tables that are out of sync in terms of
VMALLOC space mappings?

Could anyone who speaks OMAP please take a look at the linked boot
log, and hopefully make sense of it?

For background, this series enables vmap'ed stacks support for ARMv7,
which means that the entry code checks whether the stack pointer may
be pointing into the guard region before the vmalloc'ed stack, and
kills the task if it looks like the kernel stack overflowed.

Here's another instance:
https://linux.kernelci.org/build/id/6193fa5c6c4e1d02bd3358ff/

Everything builds and boots happily, but odd things happen on OMAP
based devices: Panda just gives up right after discovering the USB
controller, and Beagle-XM just starts showing all kinds of weird
crashes at roughly the same point in the boot.
I haven't looked at the logs yet... but there may be a more
fundamental reason that it may be stalling.

vmalloc space is lazily mapped to process page tables that the
allocation did not happen inside - specifically the L1 entries.

When a new thread is created, you're vmalloc()ing a kernel stack.
This is done in the parent task for the child task. If the child
task doesn't contain the L1 entry for its vmalloc'd stack, then
the first stack access by the child will fault.

The fault processing will be done in the child's context, so we
immediately try to save the state to the child's kernel stack,
which is not yet mapped. The result is another fault, which
triggers yet another fault, etc.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help