Re: [PATCH 2/3] softirq: avoid spurious stalls due to need_resched()
From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2023-03-04 01:25:46
Also in:
lkml
On Fri, Mar 03, 2023 at 03:44:13PM -0800, Jakub Kicinski wrote:
On Fri, 3 Mar 2023 15:36:27 -0800 Paul E. McKenney wrote:quoted
On Fri, Mar 03, 2023 at 02:37:39PM -0800, Paul E. McKenney wrote:quoted
On Fri, Mar 03, 2023 at 01:31:43PM -0800, Jakub Kicinski wrote:quoted
Now - now about the max loop count. I ORed the pending softirqs every time we get to the end of the loop. Looks like vast majority of the loop counter wake ups are exclusively due to RCU: @looped[512]: 5516 Where 512 is the ORed pending mask over all iterations 512 == 1 << RCU_SOFTIRQ. And they usually take less than 100us to consume the 10 iterations. Histogram of usecs consumed when we run out of loop iterations: [16, 32) 3 | | [32, 64) 4786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [64, 128) 871 |@@@@@@@@@ | [128, 256) 34 | | [256, 512) 9 | | [512, 1K) 262 |@@ | [1K, 2K) 35 | | [2K, 4K) 1 | | Paul, is this expected? Is RCU not trying too hard to be nice?This is from way back in the day, so it is quite possible that better tuning and/or better heuristics should be applied. On the other hand, 100 microseconds is a good long time from an CONFIG_PREEMPT_RT=y perspective!quoted
# cat /sys/module/rcutree/parameters/blimit 10 Or should we perhaps just raise the loop limit? Breaking after less than 100usec seems excessive :(But note that RCU also has rcutree.rcu_divisor, which defaults to 7. And an rcutree.rcu_resched_ns, which defaults to three milliseconds (3,000,000 nanoseconds). This means that RCU will do: o All the callbacks if there are less than ten. o Ten callbacks or 1/128th of them, whichever is larger. o Unless the larger of them is more than 100 callbacks, in which case there is an additional limit of three milliseconds worth of them. Except that if a given CPU ends up with more than 10,000 callbacks (rcutree.qhimark), that CPU's blimit is set to 10,000.Also, if in the context of a softirq handler (as opposed to ksoftirqd) that interrupted the idle task with no pending task, the count of callbacks is ignored and only the 3-millisecond limit counts. In the context of ksoftirq, the only limit is that which the scheduler chooses to impose. But it sure seems like the ksoftirqd case should also pay attention to that 3-millisecond limit. I will queue a patch to that effect, and maybe Eric Dumazet will show me the error of my ways.Just to be sure - have you seen Peter's patches? git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git core/softirq I think it feeds the time limit to the callback from softirq, so the local 3ms is no more?
I might or might not have back in September of 2020. ;-) But either way, the question remains: Should RCU_SOFTIRQ do time checking in ksoftirqd context? Seems like the answer should be "yes", independently of Peter's patches. Thanx, Paul