Re: [PATCH v2] tracing: Fix regression of filter waiting a long time on RCU synchronization
From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2025-06-09 16:09:36
Also in:
lkml
On Mon, Jun 09, 2025 at 11:39:55AM -0400, Steven Rostedt wrote:
On Mon, 9 Jun 2025 05:21:19 -0700 "Paul E. McKenney" [off-list ref] wrote:quoted
quoted
+static void free_filter_list(struct rcu_head *rhp) +{ + struct filter_head *filter_list = container_of(rhp, struct filter_head, rcu); + struct filter_list *filter_item, *tmp; + + list_for_each_entry_safe(filter_item, tmp, &filter_list->list, list) { + __free_filter(filter_item->filter); + list_del(&filter_item->list); + kfree(filter_item); + }If the series of synchronize_rcu() calls was taking 99 seconds, we might well be going through the above loop quite a few times, all within a bh-disabled region of code. For example, the calls from both filter_free_subsystem_filters() and process_system_preds() look like they can queue a very large number of entries.The iteration is for each event in a subsystem. Seeing what that is: ~# for sys in /sys/kernel/tracing/events/* ; do cnt=`ls -F $sys |grep '/$' | wc -l`; echo $cnt $sys; done | sort -n | tail 24 /sys/kernel/tracing/events/tcp 26 /sys/kernel/tracing/events/xen 28 /sys/kernel/tracing/events/sched 32 /sys/kernel/tracing/events/writeback 33 /sys/kernel/tracing/events/libata 34 /sys/kernel/tracing/events/irq_vectors 55 /sys/kernel/tracing/events/xhci-hcd 92 /sys/kernel/tracing/events/kvm 113 /sys/kernel/tracing/events/ext4 724 /sys/kernel/tracing/events/syscalls So the most loops that can happen is 724 on the syscalls plus the system filter. And adding in printks to count the loops, I verified that: [..] [ 100.019209] LOOPED 4 times [ 100.020033] LOOPED 5 times [ 100.020791] LOOPED 8 times [ 100.021583] LOOPED 10 times [ 100.022239] LOOPED 2 times [ 100.110515] LOOPED 725 times [ 100.111520] LOOPED 4 times [ 100.112889] LOOPED 25 times [ 100.114416] LOOPED 6 times [ 100.115950] LOOPED 4 times [ 100.117557] LOOPED 7 times [..] (724 event filters and one system event filter = 725 filters). If you feel that's too big, we could move it to a work queue.
That does sound like a bit more than would be good.
As more system calls get added, that number will grow. But it is limited to the number of system calls. All the event systems are way fewer than that.quoted
So...quoted
+ kfree(filter_list); +} + +static void free_filter_list_tasks(struct rcu_head *rhp) +{ + call_rcu(rhp, free_filter_list);...why not use queue_rcu_work() in place of call_rcu() here, thus avoiding possible response-time issues? Or is there something that I am missing that limits the length of the list passed to free_filter_list()?Just the number of events in a system. I could switch to a work queue but I don't think it needs to go as a fix, do you? That is, it can wait till the next merge window. Or is this going to affect things now?
Thank youi, and the next merge window should be fine. Thanx, Paul