Thread (64 messages) 64 messages, 8 authors, 2025-09-10

Re: [PATCHv6 perf/core 10/22] uprobes/x86: Add support to optimize uprobes

From: Peter Zijlstra <peterz@infradead.org>
Date: 2025-08-19 19:15:28
Also in: bpf, lkml

On Sun, Jul 20, 2025 at 01:21:20PM +0200, Jiri Olsa wrote:
+static bool __is_optimized(uprobe_opcode_t *insn, unsigned long vaddr)
+{
+	struct __packed __arch_relative_insn {
+		u8 op;
+		s32 raddr;
+	} *call = (struct __arch_relative_insn *) insn;
Not something you need to clean up now I suppose, but we could do with
unifying this thing. we have a bunch of instances around.
+
+	if (!is_call_insn(insn))
+		return false;
+	return __in_uprobe_trampoline(vaddr + 5 + call->raddr);
+}
+void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+	struct mm_struct *mm = current->mm;
+	uprobe_opcode_t insn[5];
+
+	/*
+	 * Do not optimize if shadow stack is enabled, the return address hijack
+	 * code in arch_uretprobe_hijack_return_addr updates wrong frame when
+	 * the entry uprobe is optimized and the shadow stack crashes the app.
+	 */
+	if (shstk_is_enabled())
+		return;
Kernel should be able to fix up userspace shadow stack just fine.
+	if (!should_optimize(auprobe))
+		return;
+
+	mmap_write_lock(mm);
+
+	/*
+	 * Check if some other thread already optimized the uprobe for us,
+	 * if it's the case just go away silently.
+	 */
+	if (copy_from_vaddr(mm, vaddr, &insn, 5))
+		goto unlock;
+	if (!is_swbp_insn((uprobe_opcode_t*) &insn))
+		goto unlock;
+
+	/*
+	 * If we fail to optimize the uprobe we set the fail bit so the
+	 * above should_optimize will fail from now on.
+	 */
+	if (__arch_uprobe_optimize(auprobe, mm, vaddr))
+		set_bit(ARCH_UPROBE_FLAG_OPTIMIZE_FAIL, &auprobe->flags);
+
+unlock:
+	mmap_write_unlock(mm);
+}
+
+static bool can_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+	if (memcmp(&auprobe->insn, x86_nops[5], 5))
+		return false;
+	/* We can't do cross page atomic writes yet. */
+	return PAGE_SIZE - (vaddr & ~PAGE_MASK) >= 5;
+}
This seems needlessly restrictive. Something like:

is_nop5(const char *buf)
{
	struct insn insn;

	ret = insn_decode_kernel(&insn, buf)
	if (ret < 0)
		return false;

	if (insn.length != 5)
		return false;

	if (insn.opcode[0] != 0x0f ||
	    insn.opcode[1] != 0x1f)
	    	return false;

	return true;
}

Should do I suppose. Anyway, I think something like:

  f0 0f 1f 44 00 00	lock nopl 0(%eax, %eax, 1)

is a valid NOP5 at +1 and will 'optimize' and result in:

  f0 e8 disp32		lock call disp32

which will #UD.

But this is nearly unfixable. Just doing my best to find weirdo cases
;-)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help