Thread (13 messages) 13 messages, 4 authors, 2009-04-27

Re: [PATCH RFC] v2 expedited "big hammer" RCU grace periods

From: Paul E. McKenney <hidden>
Date: 2009-04-27 16:18:11
Also in: lkml, netfilter-devel

On Mon, Apr 27, 2009 at 03:43:02PM +0200, Ingo Molnar wrote:
* Paul E. McKenney [off-list ref] wrote:
quoted
On Mon, Apr 27, 2009 at 05:26:39AM +0200, Ingo Molnar wrote:
quoted
* Paul E. McKenney [off-list ref] wrote:
quoted
On Sun, Apr 26, 2009 at 10:22:55PM +0200, Ingo Molnar wrote:
quoted
* Mathieu Desnoyers [off-list ref] wrote:
quoted
* Ingo Molnar (mingo@elte.hu) wrote:
quoted
* Paul E. McKenney [off-list ref] wrote:
quoted
Second cut of "big hammer" expedited RCU grace periods, but only 
for rcu_bh.  This creates another softirq vector, so that entering 
this softirq vector will have forced an rcu_bh quiescent state (as 
noted by Dave Miller).  Use smp_call_function() to invoke 
raise_softirq() on all CPUs in order to cause this to happen.  
Track the CPUs that have passed through a quiescent state (or gone 
offline) with a cpumask.
hm, i'm still asking whether doing this would be simpler via a 
reschedule vector - which not only is an existing facility but also 
forces all RCU domains through a quiescent state - not just bh-RCU 
participants.

Triggering a new softirq is in no way simpler that doing an SMP 
cross-call - in fact softirqs are a finite resource so using some 
other facility would be preferred.

Am i missing something?
I think the reason for this whole thread is that waiting for rcu 
quiescent state, when called many times e.g. in multiple iptables 
invokations, takes too longs (5 seconds to load the netfilter 
rules at boot). [...]
I'm aware of the problem space.

I was suggesting that to trigger the quiescent state and to wait for 
it to propagate it would be enough to reuse the reschedule 
mechanism.

It would be relatively straightforward: first a send-reschedule then 
do a wait_task_context_switch() on rq->curr - both are existing 
primitives. (a task reference has to be taken but that's pretty much 
all)
Well, one reason I didn't take this approach was that I didn't 
happen to think of it.  ;-)

Also that I hadn't heard of wait_task_context_switch().

Hmmm...  Looking for wait_task_context_switch().  OK, found it.

It looks to me that this primitive won't return until the 
scheduler actually decides to run something else.  We instead need 
to have something that stops waiting once the CPU enters the 
scheduler, hence the previous thought of making rcu_qsctr_inc() do 
a bit of extra work.

This would be a way of making an expedited RCU-sched across all 
RCU implementations.  As noted in the earlier email, it would not 
handle RCU or RCU-bh in a -rt kernel.
quoted
By the time wait_task_context_switch() returns from the last CPU 
we know that the quiescent state has passed.
We would want to wait for all of the CPUs in parallel, though, 
wouldn't we?  Seems that we would not want to wait for the last 
CPU to do another trip through the scheduler if it had already 
passed through the scheduler while we were waiting on the earlier 
CPUs.

So it seems like we would still want a two-pass approach -- one 
pass to capture the current state, the second pass to wait for the 
state to change.
I think waiting in parallel is still possible (first kick all tasks, 
then make sure all tasks have left the CPU at least once).

The busy-waiting in wait_task_context_switch() is indeed a 
problem - but perhaps that could be refactored to be a 
migration-thread driven wait_for_completion() + complete() 
cycle? It could be driven by preempt notifiers perhaps - and 
become zero-cost.
Hmmm...  It would need to be informed of the quiescent state even 
if that quiescent state did not result in a preemption.

But you are right -- I do need to expedite RCU, not just RCU-bh, 
especially given that the boot-speed guys are starting to see 
grace periods as a measureable fraction of the boot time.  I will 
take another pass at this.
The precise method of signalling is a detail i suspect - so by all 
means use a new softirq if that is the cleanest. I'd also agree that 
covering not just bh-rcu would definitely be a good idea.
Fair enough!  ;-)

							Thanx, Paul
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help