Thread (29 messages) 29 messages, 8 authors, 2024-09-10

Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe

From: Jiri Olsa <hidden>
Date: 2024-05-03 19:18:30
Also in: bpf, linux-api, linux-trace-kernel, lkml
Subsystem: the rest, uprobes, x86 architecture (32-bit and 64-bit) · Maintainers: Linus Torvalds, Masami Hiramatsu, Oleg Nesterov, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen

On Fri, May 03, 2024 at 03:53:15PM +0000, Edgecombe, Rick P wrote:
On Fri, 2024-05-03 at 15:04 +0200, Jiri Olsa wrote:
quoted
On Fri, May 03, 2024 at 01:34:53PM +0200, Peter Zijlstra wrote:
quoted
On Thu, May 02, 2024 at 02:23:08PM +0200, Jiri Olsa wrote:
quoted
Adding uretprobe syscall instead of trap to speed up return probe.

At the moment the uretprobe setup/path is:

   - install entry uprobe

   - when the uprobe is hit, it overwrites probed function's return
address
     on stack with address of the trampoline that contains breakpoint
     instruction

   - the breakpoint trap code handles the uretprobe consumers execution
and
     jumps back to original return address
Hi,

I worked on the x86 shadow stack support.

I didn't know uprobes did anything like this. In hindsight I should have looked
more closely. The current upstream behavior is to overwrite the return address
on the stack?

Stupid uprobes question - what is actually overwriting the return address on the
stack? Is it the kernel? If so perhaps the kernel could just update the shadow
stack at the same time.
yes, it's in kernel - arch_uretprobe_hijack_return_addr .. so I guess
we need to update the shadow stack with the new return value as well
quoted
quoted
quoted
This patch replaces the above trampoline's breakpoint instruction with new
ureprobe syscall call. This syscall does exactly the same job as the trap
with some more extra work:

   - syscall trampoline must save original value for rax/r11/rcx registers
     on stack - rax is set to syscall number and r11/rcx are changed and
     used by syscall instruction

   - the syscall code reads the original values of those registers and
     restore those values in task's pt_regs area

   - only caller from trampoline exposed in '[uprobes]' is allowed,
     the process will receive SIGILL signal otherwise
Did you consider shadow stacks? IIRC we currently have userspace shadow
stack support available, and that will utterly break all of this.
nope.. I guess it's the extra ret instruction in the trampoline that would
make it crash?
The original behavior seems problematic for shadow stack IIUC. I'm not sure of
the additional breakage with the new behavior.
I can see it's broken also for current uprobes
Roughly, how shadow stack works is there is an additional protected stack for
the app thread. The HW pushes to from the shadow stack with CALL, and pops from
it with RET. But it also continues to push and pop from the normal stack. On
pop, if the values don't match between the two stacks, an exception is
generated. The whole point is to prevent the app from overwriting its stack
return address to return to random places.

Userspace cannot (normally) write to the shadow stack, but the kernel can do
this or adust the SSP (shadow stack pointer). So in the kernel (for things like
sigreturn) there is an ability to do what is needed. Ptracers also can do things
like this.
hack below seems to fix it for the current uprobe setup,
we need similar fix for the uretprobe syscall trampoline setup

jirka


---
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 42fee8959df7..99a0948a3b79 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -21,6 +21,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clon
 void shstk_free(struct task_struct *p);
 int setup_signal_shadow_stack(struct ksignal *ksig);
 int restore_signal_shadow_stack(void);
+void uprobe_change_stack(unsigned long addr);
 #else
 static inline long shstk_prctl(struct task_struct *task, int option,
 			       unsigned long arg2) { return -EINVAL; }
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 59e15dd8d0f8..d2c4dbe5843c 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -577,3 +577,11 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long arg2)
 		return wrss_control(true);
 	return -EINVAL;
 }
+
+void uprobe_change_stack(unsigned long addr)
+{
+	unsigned long ssp;
+
+	ssp = get_user_shstk_addr();
+	write_user_shstk_64((u64 __user *)ssp, (u64)addr);
+}
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 81e6ee95784d..88afbeaacb8f 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -348,7 +348,7 @@ void *arch_uprobe_trampoline(unsigned long *psize)
 	 * only for native 64-bit process, the compat process still uses
 	 * standard breakpoint.
 	 */
-	if (user_64bit_mode(regs)) {
+	if (0 && user_64bit_mode(regs)) {
 		*psize = uretprobe_syscall_end - uretprobe_syscall_entry;
 		return uretprobe_syscall_entry;
 	}
@@ -1191,8 +1191,10 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs
 		return orig_ret_vaddr;
 
 	nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
-	if (likely(!nleft))
+	if (likely(!nleft)) {
+		uprobe_change_stack(trampoline_vaddr);
 		return orig_ret_vaddr;
+	}
 
 	if (nleft != rasize) {
 		pr_err("return address clobbered: pid=%d, %%sp=%#lx, %%ip=%#lx\n",
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help