Re: [PATCH v2 15/45] rcu: Use get/put_online_cpus_atomic() to prevent CPU offline
From: Srivatsa S. Bhat <hidden>
Date: 2013-06-26 14:13:08
Also in:
linux-arch, linux-pm, linuxppc-dev, lkml
On 06/26/2013 03:30 AM, Paul E. McKenney wrote:
On Wed, Jun 26, 2013 at 01:57:55AM +0530, Srivatsa S. Bhat wrote:quoted
Once stop_machine() is gone from the CPU offline path, we won't be able to depend on disabling preemption to prevent CPUs from going offline from under us. In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline, while being protected by a spinlock. Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline, while invoking from atomic context.I am not completely sure that this is needed. Here is a (quite possibly flawed) argument for its not being needed: o rcu_gp_init() holds off CPU-hotplug operations during grace-period initialization. Therefore, RCU will avoid looking for quiescent states from CPUs that were offline (and thus in an extended quiescent state) at the beginning of the grace period. o If force_qs_rnp() is looking for a quiescent state from a given CPU, and if it senses that CPU as being offline, then even without synchronization we know that the CPU was offline some time during the current grace period. After all, it was online at the beginning of the grace period (otherwise, we would not be looking at it at all), and our later sampling of its state must have therefore happened after the start of the grace period. Given that the grace period has not yet ended, it also has to happened before the end of the grace period. o Therefore, we should be able to sample the offline state without synchronization.
Thanks a lot for explaining the synchronization design in detail, Paul! I agree that get/put_online_cpus_atomic() is not necessary here. Regarding the debug checks under CONFIG_DEBUG_HOTPLUG_CPU, to avoid false-positives, I'm thinking of introducing a few _nocheck() variants, on a case-by-case basis, like cpu_is_offline_nocheck() (useful here in RCU) and for_each_online_cpu_nocheck() (useful in percpu-counter code, as pointed out by Tejun Heo). These fine synchronization details are kinda hard to encapsulate in that debug logic, so we can use the _nocheck() variants here to avoid getting splats when running with DEBUG_HOTPLUG_CPU enabled. Regards, Srivatsa S. Bhat