[PATCH 2/2] arm64: Clear the stack

From: Alexander Popov <hidden>
Date: 2018-05-13 08:40:13
Also in: lkml

Possibly related (same subject, not in this thread)

2018-07-19 · Re: [PATCH 2/2] arm64: Clear the stack · Mark Rutland <mark.rutland@arm.com>
2018-07-19 · Re: [PATCH 2/2] arm64: Clear the stack · Alexander Popov <hidden>
2018-07-19 · Re: [PATCH 2/2] arm64: Clear the stack · Kees Cook <hidden>
2018-07-18 · [PATCH 2/2] arm64: Clear the stack · Laura Abbott <hidden>
2018-02-22 · Re: [PATCH 2/2] arm64: Clear the stack · Laura Abbott <hidden>

Hello Mark,

Thanks a lot for your reply!

On 11.05.2018 19:13, Mark Rutland wrote:

On Fri, May 11, 2018 at 06:50:09PM +0300, Alexander Popov wrote:

quoted

On 06.05.2018 11:22, Alexander Popov wrote:

quoted

On 04.05.2018 14:09, Mark Rutland wrote:

quoted

+	stack_left = sp & (THREAD_SIZE - 1);
+	BUG_ON(stack_left < 256 || size >= stack_left - 256);

Is this arbitrary, or is there something special about 256?

Even if this is arbitrary, can we give it some mnemonic?

It's just a reasonable number. We can introduce a macro for it.

I'm just not sure I see the point in the offset, given things like
VMAP_STACK exist. BUG_ON() handling will likely require *more* than 256
bytes of stack, so it seems superfluous, as we'd be relying on stack
overflow detection at that point.

I can see that we should take the CONFIG_SCHED_STACK_END_CHECK offset
into account, though.

Mark, thank you for such an important remark!

In Kconfig STACKLEAK implies but doesn't depend on VMAP_STACK. In fact x86_32
doesn't have VMAP_STACK at all but can have STACKLEAK.

[Adding Andy Lutomirski]

I've made some additional experiments: I exhaust the thread stack to have only
(MIN_STACK_LEFT - 1) bytes left and then force alloca. If VMAP_STACK is
disabled, BUG_ON() handling causes stack depth overflow, which is detected by
SCHED_STACK_END_CHECK. If VMAP_STACK is enabled, the kernel hangs on BUG_ON()
handling! Enabling CONFIG_PROVE_LOCKING gives the needed report from VMAP_STACK:

I can't see why CONFIG_VMAP_STACK would only work in conjunction with
CONFIG_PROVE_LOCKING.

On arm64 at least, if we overflow the stack while handling a BUG(), we
*should* trigger the overflow handler as usual, and that should work,
unless I'm missing something.

Maybe it gets part-way into panic(), sets up some state,
stack-overflows, and we get wedged because we're already in a panic?
Perhaps CONFIG_PROVE_LOCKING causes more stack to be used, so it dies a
little earlier in panic(), before setting up some state that causes
wedging.

That seems likely. I later noticed that I had oops=panic kernel parameter.

... which sounds like something best fixed in those code paths, and not
here.

quoted

[...]

quoted

I can't say why VMAP_STACK report hangs during BUG_ON() handling on defconfig.
Andy, can you give a clue?

I see that MIN_STACK_LEFT = 2048 is enough for BUG_ON() handling on both x86_64
and x86_32. So I'm going to:
 - set MIN_STACK_LEFT to 2048;
 - improve the lkdtm test to cover this case.

Mark, Kees, Laura, does it sound good?


Could you have a look at the following changes in check_alloca() before I send
the next version?

If VMAP_STACK is enabled and alloca causes stack depth overflow, I write to
guard page below the thread stack to cause double fault and VMAP_STACK report.

On arm64 at least, writing to the guard page will not itself trigger a
stack overflow, but will trigger a data abort. I suspect similar is true
on x86, if the stack pointer is sufficiently far above the guard page.

Yes, you are right, my mistake.

The comment about CONFIG_VMAP_STACK in arch/x86/kernel/traps.c says:
"If we overflow the stack into a guard page, the CPU will fail to deliver #PF
and will send #DF instead."

quoted

If VMAP_STACK is disabled, I use MIN_STACK_LEFT = 2048, which seems to be enough
for BUG_ON() handling both on x86_32 and x86_64. Unfortunately, I can't
guarantee that it is always enough.

I don't think that we can choose something that's guaranteed to be
sufficient for BUG() handling and also not wasting a tonne of space
under normal operation.

Let's figure out what's going wrong on x86 in the case that you mention,
and try to solve that.

Here I don't think we should reserve space at all -- it's completely
arbitrary, and as above we can't guarantee that it's sufficient anyway.

quoted

 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
-#define MIN_STACK_LEFT 256
+#define MIN_STACK_LEFT 2048

 void __used check_alloca(unsigned long size)
 {
        unsigned long sp = (unsigned long)&sp;
        struct stack_info stack_info = {0};
        unsigned long visit_mask = 0;
        unsigned long stack_left;

        BUG_ON(get_stack_info(&sp, current, &stack_info, &visit_mask));

        stack_left = sp - (unsigned long)stack_info.begin;
+
+#ifdef CONFIG_VMAP_STACK
+       /*
+        * If alloca oversteps the thread stack boundary, we touch the guard
+        * page provided by VMAP_STACK to trigger handle_stack_overflow().
+        */
+       if (size >= stack_left)
+               *(stack_info.begin - 1) = 42;
+#else

On arm64, this won't trigger our stack overflow handler, unless the SP
is already very close to the boundary.

Please just use BUG(). If there is an issue on x86, it would be good to
solve that in the x86 code.

quoted

        BUG_ON(stack_left < MIN_STACK_LEFT ||
                                size >= stack_left - MIN_STACK_LEFT);

I really don't think we should bother with this arbitrary offset at all.

Thanks. I agree with all your points.

I wrote a third lkdtm test for STACKLEAK which runs deep recursion with alloca.
If I have just BUG_ON(size >= stack_left) in check_alloca(), I get the following
nice report without any trouble:

[    8.407261] lkdtm: Performing direct entry STACKLEAK_RECURSION_WITH_ALLOCA
[    8.408641] lkdtm: checking unused part of the thread stack (15744 bytes)...
[    8.409936] lkdtm: first 744 bytes are unpoisoned
[    8.410751] lkdtm: the rest of the thread stack is properly erased
[    8.411760] lkdtm: try to overflow the thread stack using recursion & alloca
[    8.412914] BUG: stack guard page was hit at 00000000b993c2bc (stack is 00000000764adcd4..000000005b443f11)
[    8.414471] kernel stack overflow (double-fault): 0000 [#1] SMP PTI
[    8.415409] Dumping ftrace buffer:
[    8.415907]    (ftrace buffer empty)
[    8.416404] Modules linked in: lkdtm
[    8.416905] CPU: 0 PID: 2664 Comm: sh Not tainted 4.17.0-rc3+ #39
[    8.417766] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[    8.419088] RIP: 0010:do_error_trap+0x31/0x130
[    8.419647] RSP: 0018:ffffc900009b3fc0 EFLAGS: 00010046
[    8.420263] RAX: 0000000000000000 RBX: ffffc900009b4078 RCX: 0000000000000006
[    8.421322] RDX: ffffffff81fdbe4d RSI: 0000000000000000 RDI: ffffc900009b4078
[    8.422837] RBP: 0000000000000006 R08: 0000000000000004 R09: 0000000000000000
[    8.425095] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[    8.427365] R13: ffffffff81fdbe4d R14: 0000000000000000 R15: 0000000000000000
[    8.430111] FS:  00007f7c340c1700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[    8.432515] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.433132] CR2: ffffc900009b3fb8 CR3: 000000007b330000 CR4: 00000000000006f0
[    8.433904] Call Trace:
[    8.434180]  invalid_op+0x14/0x20
[    8.434546] RIP: 0010:check_alloca+0x8e/0xa0
[    8.434995] RSP: 0018:ffffc900009b4128 EFLAGS: 00010283
[    8.435555] RAX: 0000000000000128 RBX: 0000000000000190 RCX: 0000000000000001
[    8.436479] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffffc900009b4128
[    8.437871] RBP: ffffc900009b4180 R08: 000000000000018f R09: 0000000000000007
[    8.438661] R10: 0000000000000000 R11: 0000000000000030 R12: ffff88007a626000
[    8.439433] R13: 0000000001cf5610 R14: 0000000000000020 R15: ffffc900009b7f08
[    8.440329]  ? check_alloca+0x64/0xa0
[    8.440845]  do_alloca+0x20/0x60 [lkdtm]
[    8.441937]  recursion+0xa0/0xd0 [lkdtm]
[    8.443370]  ? vsnprintf+0xf2/0x4b0
[    8.444289]  ? get_stack_info+0x32/0x160
[    8.445359]  ? check_alloca+0x64/0xa0
[    8.445995]  ? do_alloca+0x20/0x60 [lkdtm]
[    8.446449]  recursion+0xbb/0xd0 [lkdtm]
[    8.446881]  ? vsnprintf+0xf2/0x4b0
[    8.447259]  ? get_stack_info+0x32/0x160
[    8.447693]  ? check_alloca+0x64/0xa0
[    8.448088]  ? do_alloca+0x20/0x60 [lkdtm]
[    8.448539]  recursion+0xbb/0xd0 [lkdtm]
...

It seems that previously I was very "lucky" to accidentally have those MIN_STACK_LEFT,
call trace depth and oops=panic together to experience a hang on stack overflow
during BUG().


When I run my test in a loop _without_ VMAP_STACK, I manage to corrupt the neighbour
processes with BUG() handling overstepping the stack boundary. It's a pity, but
I have an idea.

In kernel/sched/core.c we already have:

#ifdef CONFIG_SCHED_STACK_END_CHECK
  	if (task_stack_end_corrupted(prev))
		panic("corrupted stack end detected inside scheduler\n");
#endif

So what would you think if I do the following in check_alloca():

	if (size >= stack_left) {
#if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
		panic("alloca over the kernel stack boundary\n");
#else
		BUG();
#endif

I think that fits well to the CONFIG_SCHED_STACK_END_CHECK policy.

Best regards,
Alexander

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help