Re: [PATCH] powerpc: Save AMR/IAMR when switching tasks
From: Samuel Holland <samuel@sholland.org>
Date: 2022-10-26 04:27:48
Also in:
lkml
On 9/21/22 00:17, Christophe Leroy wrote:
Le 21/09/2022 à 05:33, Samuel Holland a écrit :quoted
On 9/19/22 07:37, Michael Ellerman wrote:quoted
Christophe Leroy [off-list ref] writes:quoted
Le 16/09/2022 à 07:05, Samuel Holland a écrit :quoted
With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible to switch away from a task inside copy_{from,to}_user. This left the CPU with userspace access enabled until after the next IRQ or privilege level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when switching back to the original task, the userspace access would fault:This is not supposed to happen. You never switch away from a task magically. Task switch will always happen in an interrupt, that means copy_{from,to}_user() get interrupted.Unfortunately this isn't true when CONFIG_PREEMPT=y. We can switch away without an interrupt via: __copy_tofrom_user() -> __copy_tofrom_user_power7() -> exit_vmx_usercopy() -> preempt_enable() -> __preempt_schedule() -> preempt_schedule() -> preempt_schedule_common() -> __schedule() I do some boot tests with CONFIG_PREEMPT=y, but I realise now those are all on Power8, which is a bit of an oversight on my part. And clearly no one else tests it, until now :) I think the root of our problem is that our KUAP lock/unlock is at too high a level, ie. we do it in C around the low-level copy to/from. eg: static inline unsigned long raw_copy_to_user(void __user *to, const void *from, unsigned long n) { unsigned long ret; allow_write_to_user(to, n); ret = __copy_tofrom_user(to, (__force const void __user *)from, n); prevent_write_to_user(to, n); return ret; } There's a reason we did that, which is that we have various different KUAP methods on different platforms, not a simple instruction like other arches. But that means we have that exit_vmx_usercopy() being called deep in the guts of __copy_tofrom_user(), with KUAP disabled, and then we call into the preempt machinery and eventually schedule. I don't see an easy way to fix that "properly", it would be a big change to all platforms to push the KUAP save/restore down into the low level asm code. But I think the patch below does fix it, although it abuses things a little. Namely it only works because the 64s KUAP code can handle a double call to prevent, and doesn't need the addresses or size for the allow. Still I think it might be our best option for an easy fix. Samuel, can you try this on your system and check it works for you?It looks like your patch works. Thanks for the correct fix!Instead of the patch from Michael, could you try by replacing preempt_enable() by preempt_enable_no_resched() in exit_vmx_usercopy() ?
I finally got a chance to test this, and the simpler fix of using preempt_enable_no_resched() works as well.
quoted
I replaced my patch with the one below, and enabled CONFIG_PPC_KUAP_DEBUG=y, and I was able to do several kernel builds without any crashes or splats in dmesg.Did you try CONFIG_PPC_KUAP_DEBUG without the patch ? Did it detect any problem ?
I believe I did at one point, and it did not detect anything. Regards, Samuel