Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution

[PATCH v3 00/13] uprobes: RCU-protected hot path optimizations · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH v3 01/13] uprobes: revamp uprobe refcounting and lifetime management · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH v3 02/13] uprobes: protected uprobe lifetime with SRCU · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH v3 03/13] uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH v3 04/13] uprobes: travers uprobe's consumer list locklessly under SRCU protection · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
Re: [PATCH v3 04/13] uprobes: travers uprobe's consumer list locklessly under SRCU protection · Jiri Olsa <hidden> · 2024-08-22
Re: [PATCH v3 04/13] uprobes: travers uprobe's consumer list locklessly under SRCU protection · Andrii Nakryiko <hidden> · 2024-08-22
Re: [PATCH v3 04/13] uprobes: travers uprobe's consumer list locklessly under SRCU protection · Jiri Olsa <hidden> · 2024-08-22
Re: [PATCH v3 04/13] uprobes: travers uprobe's consumer list locklessly under SRCU protection · Andrii Nakryiko <hidden> · 2024-08-22
[PATCH v3 05/13] perf/uprobe: split uprobe_unregister() · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH v3 06/13] rbtree: provide rb_find_rcu() / rb_find_add_rcu() · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH v3 07/13] uprobes: perform lockless SRCU-protected uprobes_tree lookup · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH v3 08/13] uprobes: switch to RCU Tasks Trace flavor for better performance · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH RFC v3 09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout) · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
Re: [PATCH RFC v3 09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout) · Oleg Nesterov <oleg@redhat.com> · 2024-08-19
Re: [PATCH RFC v3 09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout) · Andrii Nakryiko <hidden> · 2024-08-19
Re: [PATCH RFC v3 09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout) · Oleg Nesterov <oleg@redhat.com> · 2024-08-20
Re: [PATCH RFC v3 09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout) · Andrii Nakryiko <hidden> · 2024-08-20
[PATCH RFC v3 10/13] uprobes: implement SRCU-protected lifetime for single-stepped uprobe · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH RFC v3 11/13] mm: introduce mmap_lock_speculation_{start|end} · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
[PATCH RFC v3 12/13] mm: add SLAB_TYPESAFE_BY_RCU to files_cache · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
Re: [PATCH RFC v3 12/13] mm: add SLAB_TYPESAFE_BY_RCU to files_cache · Mateusz Guzik <hidden> · 2024-08-13
Re: [PATCH RFC v3 12/13] mm: add SLAB_TYPESAFE_BY_RCU to files_cache · Suren Baghdasaryan <surenb@google.com> · 2024-08-13
Re: [PATCH RFC v3 12/13] mm: add SLAB_TYPESAFE_BY_RCU to files_cache · Andrii Nakryiko <hidden> · 2024-08-13
[PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Andrii Nakryiko <andrii@kernel.org> · 2024-08-13
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Mateusz Guzik <hidden> · 2024-08-13
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Suren Baghdasaryan <surenb@google.com> · 2024-08-13
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Mateusz Guzik <hidden> · 2024-08-15
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Andrii Nakryiko <hidden> · 2024-08-15
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Suren Baghdasaryan <surenb@google.com> · 2024-08-15
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Mateusz Guzik <hidden> · 2024-08-15
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Jann Horn <jannh@google.com> · 2024-08-15
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Mateusz Guzik <hidden> · 2024-08-15
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Suren Baghdasaryan <surenb@google.com> · 2024-08-15
Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution · Andrii Nakryiko <hidden> · 2024-08-15
Re: [PATCH v3 00/13] uprobes: RCU-protected hot path optimizations · Oleg Nesterov <oleg@redhat.com> · 2024-08-15
Re: [PATCH v3 00/13] uprobes: RCU-protected hot path optimizations · Andrii Nakryiko <hidden> · 2024-08-15
Re: [PATCH v3 00/13] uprobes: RCU-protected hot path optimizations · Andrii Nakryiko <hidden> · 2024-08-21

From: Andrii Nakryiko <hidden>
Date: 2024-08-15 20:17:18
Also in: bpf, linux-mm, lkml

On Thu, Aug 15, 2024 at 11:58 AM Jann Horn [off-list ref] wrote:

+brauner for "struct file" lifetime

On Thu, Aug 15, 2024 at 7:45 PM Suren Baghdasaryan [off-list ref] wrote:

quoted

On Thu, Aug 15, 2024 at 9:47 AM Andrii Nakryiko
[off-list ref] wrote:

quoted

On Thu, Aug 15, 2024 at 6:44 AM Mateusz Guzik [off-list ref] wrote:

quoted

On Tue, Aug 13, 2024 at 08:36:03AM -0700, Suren Baghdasaryan wrote:

quoted

On Mon, Aug 12, 2024 at 11:18 PM Mateusz Guzik [off-list ref] wrote:

quoted

On Mon, Aug 12, 2024 at 09:29:17PM -0700, Andrii Nakryiko wrote:

quoted

Now that files_cachep is SLAB_TYPESAFE_BY_RCU, we can safely access
vma->vm_file->f_inode lockless only under rcu_read_lock() protection,
attempting uprobe look up speculatively.

Stupid question: Is this uprobe stuff actually such a hot codepath
that it makes sense to optimize it to be faster than the page fault
path?

Not a stupid question, but yes, generally speaking uprobe performance
is critical for a bunch of tracing use cases. And having independent
threads implicitly contending with each other just because of uprobe's
internal implementation detail (while conceptually there should be no
dependencies for triggering uprobe from multiple parallel threads) is
a big surprise to users and affects production use cases beyond just
uprobe-handling BPF logic overhead ("useful overhead") they assume.

(Sidenote: I find it kinda interesting that this is sort of going back
in the direction of the old Speculative Page Faults design.)

quoted

We rely on newly added mmap_lock_speculation_{start,end}() helpers to
validate that mm_struct stays intact for entire duration of this
speculation. If not, we fall back to mmap_lock-protected lookup.

This allows to avoid contention on mmap_lock in absolutely majority of
cases, nicely improving uprobe/uretprobe scalability.

[...]

quoted

Note: up_write(&vma->vm_lock->lock) in the vma_start_write() is not
enough because it's one-way permeable (it's a "RELEASE operation") and
later vma->vm_file store (or any other VMA modification) can move
before our vma->vm_lock_seq store.

This makes vma_start_write() heavier but again, it's write-locking, so
should not be considered a fast path.
With this change we can use the code suggested by Andrii in
https://lore.kernel.org/all/CAEf4BzZeLg0WsYw2M7KFy0+APrPaPVBY7FbawB9vjcA2+6k69Q@mail.gmail.com/ (local)
with an additional smp_rmb():

rcu_read_lock()
vma = find_vma(...)
if (!vma) /* bail */

And maybe add some comments like:

/*
 * Load the current VMA lock sequence - we will detect if anyone concurrently
 * locks the VMA after this point.
 * Pairs with smp_wmb() in vma_start_write().
 */

quoted

vm_lock_seq = smp_load_acquire(&vma->vm_lock_seq);

/*
 * Now we just have to detect if the VMA is already locked with its current
 * sequence count.
 *
 * The following load is ordered against the vm_lock_seq load above (using
 * smp_load_acquire() for the load above), and pairs with implicit memory
 * ordering between the mm_lock_seq write in mmap_write_unlock() and the
 * vm_lock_seq write in the next vma_start_write() after that (which can only
 * occur after an mmap_write_lock()).
 */

quoted

mm_lock_seq = smp_load_acquire(&vma->mm->mm_lock_seq);
/* I think vm_lock has to be acquired first to avoid the race */
if (mm_lock_seq == vm_lock_seq)
        /* bail, vma is write-locked */
... perform uprobe lookup logic based on vma->vm_file->f_inode ...

/*
 * Order the speculative accesses above against the following vm_lock_seq
 * recheck.
 */

quoted

smp_rmb();
if (vma->vm_lock_seq != vm_lock_seq)

thanks, will incorporate these comments into the next revision

(As I said on the other thread: Since this now relies on
vma->vm_lock_seq not wrapping back to the same value for correctness,
I'd like to see vma->vm_lock_seq being at least an "unsigned long", or
even better, an atomic64_t... though I realize we don't currently do
that for seqlocks either.)

quoted

        /* bail, VMA might have changed */

The smp_rmb() is needed so that vma->vm_lock_seq load does not get
reordered and moved up before speculation.

I'm CC'ing Jann since he understands memory barriers way better than
me and will keep me honest.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help