Re: [PATCH v2 06/11] perf/uprobe: SRCU-ify uprobe->consumer list
From: Peter Zijlstra <peterz@infradead.org>
Date: 2024-07-15 11:25:09
Also in:
bpf, lkml
On Fri, Jul 12, 2024 at 02:06:08PM -0700, Andrii Nakryiko wrote:
+ bpf@vger, please cc bpf ML for the next revision, these changes are very relevant there as well, thanks On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra [off-list ref] wrote:quoted
With handle_swbp() hitting concurrently on (all) CPUs the uprobe->register_rwsem can get very contended. Add an SRCU instance to cover the consumer list and consumer lifetime. Since the consumer are externally embedded structures, unregister will have to suffer a synchronize_srcu(). A notably complication is the UPROBE_HANDLER_REMOVE logic which can race against uprobe_register() such that it might want to remove a freshly installer handler that didn't get called. In order to close this hole, a seqcount is added. With that, the removal path can tell if anything changed and bail out of the removal. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/events/uprobes.c | 60 ++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 50 insertions(+), 10 deletions(-)[...]quoted
@@ -800,7 +808,7 @@ static bool consumer_del(struct uprobe * down_write(&uprobe->consumer_rwsem); for (con = &uprobe->consumers; *con; con = &(*con)->next) { if (*con == uc) { - *con = uc->next; + WRITE_ONCE(*con, uc->next);I have a dumb and mechanical question. Above in consumer_add() you are doing WRITE_ONCE() for uc->next assignment, but rcu_assign_pointer() for uprobe->consumers. Here, you are doing WRITE_ONCE() for the same operation, if it so happens that uc == *con == uprobe->consumers. So is rcu_assign_pointer() necessary in consumer_addr()? If yes, we should have it here as well, no? And if not, why bother with it in consumer_add()?
add is a publish and needs to ensure all stores to the object are ordered before the object is linked in. Note that rcu_assign_pointer() is basically a fancy way of writing smp_store_release(). del otoh does not publish, it removes and doesn't need ordering. It does however need to ensure the pointer write itself isn't torn. That is, without the WRITE_ONCE() the compiler is free to do byte stores in order to update the 8 byte pointer value (assuming 64bit). This is pretty dumb, but very much permitted by C and also utterly fatal in the case where an RCU iteration comes by and reads a half-half pointer value.
quoted
ret = true; break; }@@ -1139,9 +1147,13 @@ void uprobe_unregister(struct inode *ino return; down_write(&uprobe->register_rwsem); + raw_write_seqcount_begin(&uprobe->register_seq); __uprobe_unregister(uprobe, uc); + raw_write_seqcount_end(&uprobe->register_seq); up_write(&uprobe->register_rwsem); put_uprobe(uprobe); + + synchronize_srcu(&uprobes_srcu); } EXPORT_SYMBOL_GPL(uprobe_unregister);[...]