Re: [RFC perf/core 05/11] uprobes: Add mapping for optimized uprobe trampolines
From: Mark Rutland <mark.rutland@arm.com>
Date: 2024-11-21 19:39:21
Also in:
bpf, lkml
From: Mark Rutland <mark.rutland@arm.com>
Date: 2024-11-21 19:39:21
Also in:
bpf, lkml
[resending as I somehow messed up the 'From' header and got a tonne of bounces] On Thu, Nov 21, 2024 at 08:47:56AM -0800, Alexei Starovoitov wrote:
On Thu, Nov 21, 2024 at 8:34 AM Peter Zijlstra [off-list ref] wrote:quoted
Elsewhere in the thread Mark Rutland already noted that arm64 really doesn't need or want this.Doesn't look like you've read what you quoted above. On arm64 the _HW_ cost may be the same. The _SW_ difference in handling trap vs syscall is real. I bet once uprobe syscall is benchmarked on arm64 there will be a delta.
I already pointed out in [1] that on arm64 we can make the trap case *faster* than the syscall. If that's not already the case, there's only a small amount of rework needed, (pulling BRK handling into entry-common.c), which we want to do for other reasons anyway. On arm64 I do not want the syscall; the trap is faster and simpler to maintain. Mark [1] https://lore.kernel.org/lkml/ZzsRfhGSYXVK0mst@J2N7QTR9R3/ (local)