Thread (38 messages) 38 messages, 11 authors, 2023-05-09

Re: [PATCH 2/3] softirq: avoid spurious stalls due to need_resched()

From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2023-03-04 01:25:46
Also in: lkml

On Fri, Mar 03, 2023 at 03:44:13PM -0800, Jakub Kicinski wrote:
On Fri, 3 Mar 2023 15:36:27 -0800 Paul E. McKenney wrote:
quoted
On Fri, Mar 03, 2023 at 02:37:39PM -0800, Paul E. McKenney wrote:
quoted
On Fri, Mar 03, 2023 at 01:31:43PM -0800, Jakub Kicinski wrote:  
quoted
Now - now about the max loop count. I ORed the pending softirqs every
time we get to the end of the loop. Looks like vast majority of the
loop counter wake ups are exclusively due to RCU:

@looped[512]: 5516

Where 512 is the ORed pending mask over all iterations
512 == 1 << RCU_SOFTIRQ.

And they usually take less than 100us to consume the 10 iterations.
Histogram of usecs consumed when we run out of loop iterations:

[16, 32)               3 |                                                    |
[32, 64)            4786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128)            871 |@@@@@@@@@                                           |
[128, 256)            34 |                                                    |
[256, 512)             9 |                                                    |
[512, 1K)            262 |@@                                                  |
[1K, 2K)              35 |                                                    |
[2K, 4K)               1 |                                                    |

Paul, is this expected? Is RCU not trying too hard to be nice?  
This is from way back in the day, so it is quite possible that better
tuning and/or better heuristics should be applied.

On the other hand, 100 microseconds is a good long time from an
CONFIG_PREEMPT_RT=y perspective!
  
quoted
# cat /sys/module/rcutree/parameters/blimit
10

Or should we perhaps just raise the loop limit? Breaking after less 
than 100usec seems excessive :(  
But note that RCU also has rcutree.rcu_divisor, which defaults to 7.
And an rcutree.rcu_resched_ns, which defaults to three milliseconds
(3,000,000 nanoseconds).  This means that RCU will do:

o	All the callbacks if there are less than ten.

o	Ten callbacks or 1/128th of them, whichever is larger.

o	Unless the larger of them is more than 100 callbacks, in which
	case there is an additional limit of three milliseconds worth
	of them.

Except that if a given CPU ends up with more than 10,000 callbacks
(rcutree.qhimark), that CPU's blimit is set to 10,000.  
Also, if in the context of a softirq handler (as opposed to ksoftirqd)
that interrupted the idle task with no pending task, the count of
callbacks is ignored and only the 3-millisecond limit counts.  In the
context of ksoftirq, the only limit is that which the scheduler chooses
to impose.

But it sure seems like the ksoftirqd case should also pay attention to
that 3-millisecond limit.  I will queue a patch to that effect, and maybe
Eric Dumazet will show me the error of my ways.
Just to be sure - have you seen Peter's patches?

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git core/softirq

I think it feeds the time limit to the callback from softirq,
so the local 3ms is no more?
I might or might not have back in September of 2020.  ;-)

But either way, the question remains:  Should RCU_SOFTIRQ do time checking
in ksoftirqd context?  Seems like the answer should be "yes", independently
of Peter's patches.

							Thanx, Paul
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help