Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
From: Jiri Olsa <hidden>
Date: 2024-05-03 19:18:30
Also in:
bpf, linux-api, linux-trace-kernel, lkml
Subsystem:
the rest, uprobes, x86 architecture (32-bit and 64-bit) · Maintainers:
Linus Torvalds, Masami Hiramatsu, Oleg Nesterov, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen
On Fri, May 03, 2024 at 03:53:15PM +0000, Edgecombe, Rick P wrote:
On Fri, 2024-05-03 at 15:04 +0200, Jiri Olsa wrote:quoted
On Fri, May 03, 2024 at 01:34:53PM +0200, Peter Zijlstra wrote:quoted
On Thu, May 02, 2024 at 02:23:08PM +0200, Jiri Olsa wrote:quoted
Adding uretprobe syscall instead of trap to speed up return probe. At the moment the uretprobe setup/path is: - install entry uprobe - when the uprobe is hit, it overwrites probed function's return address on stack with address of the trampoline that contains breakpoint instruction - the breakpoint trap code handles the uretprobe consumers execution and jumps back to original return addressHi, I worked on the x86 shadow stack support. I didn't know uprobes did anything like this. In hindsight I should have looked more closely. The current upstream behavior is to overwrite the return address on the stack? Stupid uprobes question - what is actually overwriting the return address on the stack? Is it the kernel? If so perhaps the kernel could just update the shadow stack at the same time.
yes, it's in kernel - arch_uretprobe_hijack_return_addr .. so I guess we need to update the shadow stack with the new return value as well
quoted
quoted
quoted
This patch replaces the above trampoline's breakpoint instruction with new ureprobe syscall call. This syscall does exactly the same job as the trap with some more extra work: - syscall trampoline must save original value for rax/r11/rcx registers on stack - rax is set to syscall number and r11/rcx are changed and used by syscall instruction - the syscall code reads the original values of those registers and restore those values in task's pt_regs area - only caller from trampoline exposed in '[uprobes]' is allowed, the process will receive SIGILL signal otherwiseDid you consider shadow stacks? IIRC we currently have userspace shadow stack support available, and that will utterly break all of this.nope.. I guess it's the extra ret instruction in the trampoline that would make it crash?The original behavior seems problematic for shadow stack IIUC. I'm not sure of the additional breakage with the new behavior.
I can see it's broken also for current uprobes
Roughly, how shadow stack works is there is an additional protected stack for the app thread. The HW pushes to from the shadow stack with CALL, and pops from it with RET. But it also continues to push and pop from the normal stack. On pop, if the values don't match between the two stacks, an exception is generated. The whole point is to prevent the app from overwriting its stack return address to return to random places. Userspace cannot (normally) write to the shadow stack, but the kernel can do this or adust the SSP (shadow stack pointer). So in the kernel (for things like sigreturn) there is an ability to do what is needed. Ptracers also can do things like this.
hack below seems to fix it for the current uprobe setup, we need similar fix for the uretprobe syscall trampoline setup jirka ---
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 42fee8959df7..99a0948a3b79 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h@@ -21,6 +21,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clon void shstk_free(struct task_struct *p); int setup_signal_shadow_stack(struct ksignal *ksig); int restore_signal_shadow_stack(void); +void uprobe_change_stack(unsigned long addr); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; }
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 59e15dd8d0f8..d2c4dbe5843c 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c@@ -577,3 +577,11 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) return wrss_control(true); return -EINVAL; } + +void uprobe_change_stack(unsigned long addr) +{ + unsigned long ssp; + + ssp = get_user_shstk_addr(); + write_user_shstk_64((u64 __user *)ssp, (u64)addr); +}
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 81e6ee95784d..88afbeaacb8f 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c@@ -348,7 +348,7 @@ void *arch_uprobe_trampoline(unsigned long *psize) * only for native 64-bit process, the compat process still uses * standard breakpoint. */ - if (user_64bit_mode(regs)) { + if (0 && user_64bit_mode(regs)) { *psize = uretprobe_syscall_end - uretprobe_syscall_entry; return uretprobe_syscall_entry; }
@@ -1191,8 +1191,10 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs return orig_ret_vaddr; nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize); - if (likely(!nleft)) + if (likely(!nleft)) { + uprobe_change_stack(trampoline_vaddr); return orig_ret_vaddr; + } if (nleft != rasize) { pr_err("return address clobbered: pid=%d, %%sp=%#lx, %%ip=%#lx\n",