Thread (9 messages) 9 messages, 5 authors, 2023-03-11

Re: [PATCH] Reset task stack state in bringup_cpu()

From: David Woodhouse <dwmw2@infradead.org>
Date: 2023-03-11 10:52:47
Also in: linux-um, lkml

On  Mon, 15 Nov 2021 at 11:33:10 +0000, Mark Rutland wrote:
quoted hunk ↗ jump to hunk
To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks are in use the
task's SCS SP is left pointing at an arbitrary point within the task's
shadow call stack.

When an offlines CPU is hotlpugged back into the kernel, this stale
state can adversely affect the newly onlined CPU. Stale KASAN shadow can
alias new stackframes and result in bogus KASAN warnings. A stale SCS SP
is effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the task's entire
shadow call stack can become unusable.

We previously fixed the KASAN issue in commit:

  e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug")

In commit:

  f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")

... we broke both KASAN and SCS, with SCS being fixed up in commit:

  63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit")

... but as this runs in the context of the idle task being offlines it's
potentially fragile.

Fix both of these consistently and more robustly by resetting the SCS SP
and KASAN shadow immediately before we online a CPU. This ensures the
idle task always has a consistent state, and removes the need to do so
when initializing an idle task or when unplugging an idle task.

I've tested this with both GCC and clang, with reelvant options enabled,
offlining and online CPUs with:
quoted
while true; do
  for C in /sys/devices/system/cpu/cpu*/online; do
    echo 0 > $C;
    echo 1 > $C;
  done
done
Link: https://lore.kernel.org/lkml/20211012083521.973587-1-woodylin@google.com/ (local)
Link: https://lore.kernel.org/linux-arm-kernel/YY9ECKyPtDbD9q8q@qian-HP-Z2-SFF-G5-Workstation/ (local)
Fixes: 1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <redacted>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <redacted>
Cc: Will Deacon <will@kernel.org>
Cc: Woody Lin <redacted>

---
 kernel/cpu.c        | 7 +++++++
 kernel/sched/core.c | 4 ----
 2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 192e43a87407..407a2568f35e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -31,6 +31,7 @@
 #include <linux/smpboot.h>
 #include <linux/relay.h>
 #include <linux/slab.h>
+#include <linux/scs.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/cpuset.h>
 
@@ -588,6 +589,12 @@ static int bringup_cpu(unsigned int cpu)
        int ret;
 
        /*
+        * Reset stale stack state from the last time this CPU was
online.
+        */
+       scs_task_reset(idle);
+       kasan_unpoison_task_stack(idle);
Hm, in the !CONFIG_GENERIC_SMP_IDLE_THREAD case, idle_thread_get() will
have returned NULL. Won't these then crash?

Admittedly that seems to be *only* for UM, as all other architectures
with SMP seem to set CONFIG_GENERIC_SMP_IDLE_THREAD.

cf.
https://lore.kernel.org/all/5a9d6d4aef78de6fc8a2cfc62922d06617cbe109.camel@infradead.org/ (local)
https://lore.kernel.org/all/f59ac9c08122b338bbda137d29013add0e194933.camel@infradead.org/ (local)


Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help