Re: [PATCH RFC] v7 expedited "big hammer" RCU grace periods
From: Paul E. McKenney <hidden>
Date: 2009-05-25 16:44:50
Also in:
lkml, netfilter-devel
On Mon, May 25, 2009 at 02:35:15PM +0800, Lai Jiangshan wrote:
Paul E. McKenney wrote:quoted
Seventh cut of "big hammer" expedited RCU grace periods. This leverages the existing per-CPU migration kthreads, as suggested by Ingo. These are awakened in a loop, and waited for in a second loop. Not fully scalable, but removing the extra hop through smp_call_function reduces latency on systems with moderate numbers of CPUs. The synchronize_rcu_expedited() and and synchronize_bh_expedited() primitives invoke synchronize_sched_expedited(), except for CONFIG_PREEMPT_RCU, where they instead invoke synchronize_rcu() and synchronize_rcu_bh(), respectively. This will be fixed in the future, after preemptable RCU is folded into the rcutree implementation.I'm strongly need this guarantee: preempt_disable() guarantees/implies rcu_read_lock(). And local_irq_diable() guarantees/implies rcu_read_lock(). rcu_read_lock_bh() guarantees/implies rcu_read_lock(). It will simplifies codes.
Hmmm... I did produce a patch that modified preemptable RCU in this way quite some time back, but at that time the problem was solved in a different way. The approach was either to add rcu_read_lock()/rcu_read_unlock() pairs as needed, or to switch to call_rcu_sched() or synchronize_sched() for the update side. But please note that this approach would not permit the current implementation of synchronize_sched_expedited() to be used in the preemptable-RCU case, because synchronize_sched_expedited() would not wait for RCU read-side critical sections that had been preempted. And the ability to preempt RCU read-side critical sections is absolutely required for CONFIG_PREEMPT_RT versions of the Linux kernel.
And A lot's of current kernel code does not use rcu_read_lock() when it has preempt_disable()-ed/local_irq_diable()-ed or when it is in irq/softirq. Without these guarantees, these code is broken.
One way or another, such code does need to be fixed. Either by reintroducing the old patch, or by fixing the code itself.
quoted
+ +static DEFINE_PER_CPU(struct migration_req, rcu_migration_req); +static DEFINE_MUTEX(rcu_sched_expedited_mutex); + +/* + * Wait for an rcu-sched grace period to elapse, but use "big hammer" + * approach to force grace period to end quickly. This consumes + * significant time on all CPUs, and is thus not recommended for + * any sort of common-case code. + */ +void synchronize_sched_expedited(void) +{ + int cpu; + unsigned long flags; + struct rq *rq; + struct migration_req *req; + + mutex_lock(&rcu_sched_expedited_mutex); + get_online_cpus(); + for_each_online_cpu(cpu) { + rq = cpu_rq(cpu); + req = &per_cpu(rcu_migration_req, cpu); + init_completion(&req->done); + req->task = NULL; + req->dest_cpu = -1; + spin_lock_irqsave(&rq->lock, flags); + list_add(&req->list, &rq->migration_queue); + spin_unlock_irqrestore(&rq->lock, flags); + wake_up_process(rq->migration_thread); + } + for_each_online_cpu(cpu) { + req = &per_cpu(rcu_migration_req, cpu); + wait_for_completion(&req->done); + } + put_online_cpus(); + mutex_unlock(&rcu_sched_expedited_mutex); +} +EXPORT_SYMBOL_GPL(synchronize_sched_expedited); + +#endif /* #else #ifndef CONFIG_SMP */Very nice implement!
Glad you like it!!! The idea is Ingo's. Gah!!! I need to have included a Suggested-by on v7 of the patch!!!
Only one opinion:
get_online_cpus() is a large lock, a lot's of lock in kernel is required
after cpu_hotplug.lock.
_cpu_down()
cpu_hotplug_begin()
mutex_lock(&cpu_hotplug.lock)
__raw_notifier_call_chain(CPU_DOWN_PREPARE)
Lock a-kernel-lock.
It means when we have held a-kernel-lock, we can not call
synchronize_sched_expedited(). get_online_cpus() narrows
synchronize_sched_expedited()'s usages.
I think we can reuse req->dest_cpu and remove get_online_cpus().
(and use preempt_disable() and for_each_possible_cpu())
req->dest_cpu = -2 means @req is not queued
req->dest_cpu = -1 means @req is queued
a little like this code:
mutex_lock(&rcu_sched_expedited_mutex);
for_each_possible_cpu(cpu) {
preempt_disable()
if (cpu is not online)
just set req->dest_cpu to -2;
else
init and queue req, and wake_up_process().
preempt_enable()
}
for_each_possible_cpu(cpu) {
if (req is queued)
wait_for_completion().
}
mutex_unlock(&rcu_sched_expedited_mutex);Good point -- I should at the very least add a comment to synchronize_sched_expedited() stating that it cannot be called holding any lock that is acquired in a CPU hotplug notifier. If this restriction causes any problems, then your approach seems like a promising fix.
The coupling of synchronize_sched_expedited() and migration_req is largely increased: 1) The offline cpu's per_cpu(rcu_migration_req, cpu) is handled. See migration_call::CPU_DEAD
Good. ;-)
2) migration_call() is the highest priority of cpu notifiers, So even any other cpu notifier calls synchronize_sched_expedited(), It'll not cause DEADLOCK.
You mean if using your preempt_disable() approach, right? Unless I am missing something, the current get_online_cpus() approach would deadlock in this case. Thanx, Paul