Thread (72 messages) 72 messages, 8 authors, 2013-06-27

Re: [PATCH v2 15/45] rcu: Use get/put_online_cpus_atomic() to prevent CPU offline

From: Srivatsa S. Bhat <hidden>
Date: 2013-06-26 14:13:08
Also in: linux-arch, linux-pm, linuxppc-dev, lkml

On 06/26/2013 03:30 AM, Paul E. McKenney wrote:
On Wed, Jun 26, 2013 at 01:57:55AM +0530, Srivatsa S. Bhat wrote:
quoted
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline,
while being protected by a spinlock. Use the get/put_online_cpus_atomic()
APIs to prevent CPUs from going offline, while invoking from atomic context.
I am not completely sure that this is needed.  Here is a (quite possibly
flawed) argument for its not being needed:

o	rcu_gp_init() holds off CPU-hotplug operations during
	grace-period initialization.  Therefore, RCU will avoid
	looking for quiescent states from CPUs that were offline
	(and thus in an extended quiescent state) at the beginning
	of the grace period.

o	If force_qs_rnp() is looking for a quiescent state from
	a given CPU, and if it senses that CPU as being offline,
	then even without synchronization we know that the CPU
	was offline some time during the current grace period.

	After all, it was online at the beginning of the grace
	period (otherwise, we would not be looking at it at all),
	and our later sampling of its state must have therefore
	happened after the start of the grace period.  Given that
	the grace period has not yet ended, it also has to happened
	before the end of the grace period.

o	Therefore, we should be able to sample the offline state
	without synchronization.
Thanks a lot for explaining the synchronization design in detail, Paul!
I agree that get/put_online_cpus_atomic() is not necessary here.

Regarding the debug checks under CONFIG_DEBUG_HOTPLUG_CPU, to avoid
false-positives, I'm thinking of introducing a few _nocheck() variants,
on a case-by-case basis, like cpu_is_offline_nocheck() (useful here in RCU)
and for_each_online_cpu_nocheck() (useful in percpu-counter code, as
pointed out by Tejun Heo). These fine synchronization details are kinda
hard to encapsulate in that debug logic, so we can use the _nocheck()
variants here to avoid getting splats when running with DEBUG_HOTPLUG_CPU
enabled.

Regards,
Srivatsa S. Bhat
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help