Thread (51 messages) 51 messages, 7 authors, 2024-11-26

Re: [RFC perf/core 05/11] uprobes: Add mapping for optimized uprobe trampolines

From: Andrii Nakryiko <hidden>
Date: 2024-11-19 06:07:03
Also in: bpf, lkml

On Sat, Nov 16, 2024 at 1:44 PM Jiri Olsa [off-list ref] wrote:
On Thu, Nov 14, 2024 at 03:44:12PM -0800, Andrii Nakryiko wrote:
quoted
On Tue, Nov 5, 2024 at 8:33 AM Jiri Olsa [off-list ref] wrote:
quoted
On Tue, Nov 05, 2024 at 03:23:27PM +0100, Peter Zijlstra wrote:
quoted
On Tue, Nov 05, 2024 at 02:33:59PM +0100, Jiri Olsa wrote:
quoted
Adding interface to add special mapping for user space page that will be
used as place holder for uprobe trampoline in following changes.

The get_tramp_area(vaddr) function either finds 'callable' page or create
new one.  The 'callable' means it's reachable by call instruction (from
vaddr argument) and is decided by each arch via new arch_uprobe_is_callable
function.

The put_tramp_area function either drops refcount or destroys the special
mapping and all the maps are clean up when the process goes down.
In another thread somewhere, Andrii mentioned that Meta has executables
with more than 4G of .text. This isn't going to work for them, is it?
not if you can't reach the trampoline from the probed address
That specific example was about 1.5GB (though we might have bigger
.text, I didn't do exhaustive research). As Jiri said, this would be
best effort trying to find closest free mapping to stay within +/-2GB
offset. If that fails, we always would be falling back to slower
int3-based uprobing, yep.

Jiri, we could also have an option to support 64-bit call, right? We'd
need nop9 for that, but it's an option as well to future-proofing this
approach, no?
hm, I don't think there's call with relative 64bit offset
why do you need a relative, when you have 64 bits? ;) there is a call
to absolute address, no?
there's indirect call through register or address.. but I think we would
fit in nop10 with the indirect call through address
quoted
Also, can we somehow use fs/gs-based indirect calls/jumps somehow to
have a guarantee that offset is always small (<2GB away relative to
the base stored in fs/gs). Not sure if this is feasible, but I thought
it would be good to bring this up just to make sure it doesn't work.

If segment based absolute call is somehow feasible, we can probably
simplify a bunch of stuff by allocating it eagerly, once, and
somewhere high up next to VDSO (or maybe even put it into VDSO, don't
now).
yes, that would be convenient

jirka
quoted
Anyways, let's brainstorm if there are any clever alternatives here.

quoted
jirka
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help