Re: [PATCH] x86/suspend: fix false positive KASAN warning on suspend/resume

From: Josh Poimboeuf <hidden>
Date: 2016-12-01 17:56:15
Also in: lkml

On Thu, Dec 01, 2016 at 06:47:07PM +0100, Dmitry Vyukov wrote:

On Thu, Dec 1, 2016 at 6:34 PM, Josh Poimboeuf [off-list ref] wrote:

quoted

On 12/01/2016 02:10 AM, Josh Poimboeuf wrote:

quoted

Resuming from a suspend operation is showing a KASAN false positive
warning:

quoted

KASAN instrumentation poisons the stack when entering a function and
unpoisons it when exiting the function.  However, in the suspend path,
some functions never return, so their stack never gets unpoisoned,
resulting in stale KASAN shadow data which can cause false positive
warnings like the one above.

Reported-by: Scott Bauer <redacted>
Tested-by: Scott Bauer <redacted>
Signed-off-by: Josh Poimboeuf <redacted>
---
 arch/x86/kernel/acpi/sleep.c | 3 +++
 include/linux/kasan.h        | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 4858733..62bd046 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c

@@ -115,6 +115,9 @@ int x86_acpi_suspend_lowlevel(void)
  pause_graph_tracing();
  do_suspend_lowlevel();
  unpause_graph_tracing();
+
+ kasan_unpoison_stack_below_sp();
+

I think this might be too late. We may hit stale poison in the first C function called
after resume (restore_processor_state()). Thus the shadow must be unpoisoned prior such call,
i.e. somewhere in do_suspend_lowlevel() after .Lresume_point.

Yeah, I think you're right.  Will spin a v2.

So I tried calling kasan_unpoison_task_stack_below() from
do_suspend_lowlevel(), but it hung on the resume.  Presumably because
restore_processor_state() does some important setup which would be
needed before calling into kasan_unpoison_task_stack_below().  For
example, setting up the gs register.  So it's a bit of a catch-22.

It could probably be fixed properly by rewriting do_suspend_lowlevel()
to call restore_processor_state() with the temporary stack before
switching to the original stack and doing the unpoison.

(And there are some other issues with do_suspend_lowlevel() and I'd love
to try taking a scalpel to it.  But I have too many knives in the air
already to want to try to attempt that right now...)

Unless somebody else wants to take a stab at it, my original patch is
probably good enough for now, since restore_processor_state() doesn't
seem to be triggering any KASAN warnings.

restore_processor_state/__restore_processor_state does not seem to
have any local variables, so KASAN does not do any stack checks there.

Actually, looking at the object code, it uses a lot of stack space and
has several calls to __asan_report_load*() functions.  Probably due to
inlining of other functions which have stack variables.

That can be loads of heap variables (or other non-stack data). KASAN
will emit these checks for lots of loads, but they don't necessary go
to stack.

I also see the stack poisoning instructions:

 54f:   49 c1 ee 03             shr    $0x3,%r14
 553:   4c 01 f0                add    %r14,%rax
 556:   c7 00 f1 f1 f1 f1       movl   $0xf1f1f1f1,(%rax)
 55c:   c7 40 04 00 00 f4 f4    movl   $0xf4f40000,0x4(%rax)
 563:   c7 40 08 f3 f3 f3 f3    movl   $0xf3f3f3f3,0x8(%rax)

OK, then we are in trouble potentially.
It may work as long as as the stack region that is used for local vars
in restore_processor_state() does not contain any stale poisoning. But
it can break at any moment.

Have you tried kasan_unpoison_task_stack_below() or kasan_unpoison_shadow()?
I can see how kasan_unpoison_task_stack_below() can hang (it at least
uses current). But kasan_unpoison_shadow() is quite trivial, it
computes shadow address with simple math and writes zeroes there.

Good idea, I'll give kasan_unpoison_shadow() a shot.

-- 
Josh

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help