Re: [PATCH] Reset task stack state in bringup_cpu()
From: Mark Rutland <mark.rutland@arm.com>
Date: 2021-11-17 11:52:44
Also in:
lkml
On Tue, Nov 16, 2021 at 11:31:40AM -0500, Qian Cai wrote:
On Mon, Nov 15, 2021 at 11:33:10AM +0000, Mark Rutland wrote:quoted
To hot unplug a CPU, the idle task on that CPU calls a few layers of C code before finally leaving the kernel. When KASAN is in use, poisoned shadow is left around for each of the active stack frames, and when shadow call stacks are in use. When shadow call stacks are in use the task's SCS SP is left pointing at an arbitrary point within the task's shadow call stack. When an offlines CPU is hotlpugged back into the kernel, this stale state can adversely affect the newly onlined CPU. Stale KASAN shadow can alias new stackframes and result in bogus KASAN warnings. A stale SCS SP is effectively a memory leak, and prevents a portion of the shadow call stack being used. Across a number of hotplug cycles the task's entire shadow call stack can become unusable. We previously fixed the KASAN issue in commit: e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug") In commit: f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled") ... we broke both KASAN and SCS, with SCS being fixed up in commit: 63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit") ... but as this runs in the context of the idle task being offlines it's potentially fragile. Fix both of these consistently and more robustly by resetting the SCS SP and KASAN shadow immediately before we online a CPU. This ensures the idle task always has a consistent state, and removes the need to do so when initializing an idle task or when unplugging an idle task. I've tested this with both GCC and clang, with reelvant options enabled, offlining and online CPUs with: | while true; do | for C in /sys/devices/system/cpu/cpu*/online; do | echo 0 > $C; | echo 1 > $C; | done | done Link: https://lore.kernel.org/lkml/20211012083521.973587-1-woodylin@google.com/ (local) Link: https://lore.kernel.org/linux-arm-kernel/YY9ECKyPtDbD9q8q@qian-HP-Z2-SFF-G5-Workstation/ (local) Fixes: 1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled") Reported-by: Qian Cai <redacted> Signed-off-by: Mark Rutland <mark.rutland@arm.com>Thanks for fixing this quickly, Mark. Triggering an user-after-free in user namespace but don't think it is related. I'll investigate that first since it is blocking the rest of regression testing.
Cool; are you happy to provide a Tested-by tag for this patch? :) Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel