Thread (23 messages) 23 messages, 3 authors, 2024-08-07

Re: [PATCH RFT v7 9/9] selftests/clone3: Test shadow stack support

From: Kees Cook <kees@kernel.org>
Date: 2024-08-07 18:23:05
Also in: linux-kselftest, lkml

On Wed, Aug 07, 2024 at 01:39:27PM +0100, Mark Brown wrote:
On Tue, Aug 06, 2024 at 10:08:44PM -0700, Kees Cook wrote:
quoted
On Tue, Aug 06, 2024 at 04:10:02PM +0100, Mark Brown wrote:
quoted
quoted
quoted
  # Running test 'Shadow stack with no token'
quoted
It took me a while to figure out where a thread switches shstk (even
without this series):
quoted
kernel_clone, copy_process, copy_thread, fpu_clone, update_fpu_shstk
(and shstk_alloc_thread_stack is called just before update_fpu_shstk).
quoted
I don't understand the token consumption in arch_shstk_post_fork(). This
wasn't needed before with the fixed-size new shstk, why is it needed
now?
Concerns were raised on earlier rounds of review that since instead of
allocating the shadow stack as part of creating the new thread we are
using a previously allocated shadow stack someone could use this as part
of an exploit.  You could just jump on top of any existing shadow stack
and cause writes to it.
quoted
Anyway, my attempt to trace the shstk changes for the test:
quoted
write(1, "TAP version 13\n", 15)        = 15
write(1, "1..2\n", 5)                   = 5
clone3({flags=0, exit_signal=18446744073709551615, stack=NULL, stack_size=0}, 104) = -1 EINVAL (Invalid argument)
write(1, "# clone3() syscall supported\n", 29) = 29
map_shadow_stack(NULL, 4096, 0)         = 125837480497152
write(1, "# Shadow stack supportd\n", 24) = 24
write(1, "# Running test 'Shadow stack wit"..., 44) = 44
getpid()                                = 4943
write(1, "# [4943] Trying clone3() with fl"..., 51) = 51
map_shadow_stack(NULL, 4096, 0)         = 125837480488960
clone3({flags=CLONE_VM, exit_signal=SIGCHLD, stack=NULL, stack_size=0, /* bytes 88..103 */ "\x00\xf0\x52\xd2\x72\x72\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00"} => {/* bytes 88..103 */ "\x00\xf0\x52\xd2\x72\x72\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00"}, 104) = 4944
getpid()                                = 4943
write(1, "# I am the parent (4943). My chi"..., 49strace: Process 4944 attached
) = 49
[pid  4944] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_CPERR, si_addr=NULL} ---
[pid  4943] wait4(-1,  <unfinished ...>
[pid  4944] +++ killed by SIGSEGV (core dumped) +++
So we created the thread, then before we get to the wait4() in the
parent we start delivering a SEGV_CPERR to the child.  The flow for the
child is as expected.
quoted
<... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], __WALL, NULL) = 4944
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=4944, si_uid=0, si_status=SIGSEGV, si_utime=0, si_stime=0} ---
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7272d21fffe8} ---
+++ killed by SIGSEGV (core dumped) +++
Then the parent gets an ordinary segfault, not a shadow stack specific
one, like some memory got deallocated underneath it or a pointer got
corrupted.
quoted
[  569.153288] shstk_setup: clone3[4943] ssp:7272d2200000
[  569.153998] process: copy_thread: clone3[4943] new_ssp:7272d2530000
[  569.154002] update_fpu_shstk: clone3[4943] ssp:7272d2530000
[  569.154008] shstk_post_fork: clone3[4944]
[  569.154011] shstk_post_fork: clone3[4944] sending SIGSEGV post fork
quoted
I don't see an update_fpu_shstk for 4944? Should I with this test?
I'd only expect to see one update, my understanding is that that update
is for the child but happening in the context of the parent as the hild
is not yet started.
What's weird here that I don't understand is that the parent is 4943, so
this report makes sense:
quoted
[  569.153288] shstk_setup: clone3[4943] ssp:7272d2200000
The child is 4944, yet I see:
quoted
[  569.153998] process: copy_thread: clone3[4943] new_ssp:7272d2530000
[  569.154002] update_fpu_shstk: clone3[4943] ssp:7272d2530000
These map to my logging:

copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
	...
	new_ssp = shstk_alloc_thread_stack(p, args);
	pr_err("%s: %s[%d] new_ssp:%lx\n", __func__, p->comm, task_pid_nr(p), new_ssp);

and

update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
	...
        xstate->user_ssp = (u64)ssp;
	pr_err("%s: %s[%d] ssp:%lx\n", __func__, dst->comm, task_pid_nr(dst), ssp);

The child should be "p" (and "dst") here -- stuff is being copied from
current to p, but p is reporting itself as 4943 here? (Oh, this is
reporting pid, not tid... I bet that's what I've got wrong.)
quoted hunk ↗ jump to hunk
Does this help:
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 27acbdf44c5f..d7005974aff5 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -258,6 +258,8 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
 	if (args->shadow_stack) {
 		addr = args->shadow_stack;
 		size = args->shadow_stack_size;
+		shstk->base = 0;
+		shstk->size = 0;
 	} else {
 		/*
 		 * For CLONE_VFORK the child will share the parents
I'll fix my reporting and give this patch a try too. Thanks!

-Kees

-- 
Kees Cook
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help