Re: [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers
From: Michael Wang <hidden>
Date: 2012-12-05 03:29:12
Also in:
lkml
On 12/05/2012 10:56 AM, Michael Wang wrote: [...]
quoted
I wonder about the cpu-online case. A typical caller might want to do: /* * Set each online CPU's "foo" to "bar" */ int global_bar; void set_cpu_foo(int bar) { get_online_cpus_stable_atomic(); global_bar = bar; for_each_online_cpu_stable() cpu->foo = bar; put_online_cpus_stable_atomic() } void_cpu_online_notifier_handler(void) { cpu->foo = global_bar; }
Oh, forgive me for misunderstanding your question :( In this case, we have to prevent hotplug happen, not just ensure the online mask is correct. Hmm..., we need more consideration. Regards, Michael Wang
quoted
And I think that set_cpu_foo() would be buggy, because a CPU could come online before global_bar was altered, and that newly-online CPU would pick up the old value of `bar'. So what's the rule here? global_bar must be written before we run get_online_cpus_stable_atomic()? Anyway, please have a think and spell all this out?That's right, actually this related to one question, should the hotplug happen during get_online and put_online? Answer will be YES according to old API which using mutex, the hotplug won't happen in critical section, but the cost is get_online() will block, which will kill the performance. So we designed this mechanism to do acceleration, but as you pointed out, although the result will never be wrong, but the 'stable' mask is not stable since it could be changed in critical section. And we have two solution. One is from Srivatsa, using 'read_lock' and 'write_lock', it will prevent hotplug happen just like the old rule, the cost is we need a global 'rw_lock' which perform bad on NUMA system, and no doubt, get_online() will block for short time when doing hotplug. Another is to maintain a per-cpu cache mask, this mask will only be updated in get_online(), and be used in critical section, then we will get a real stable mask, but one flaw is, on different cpu in critical section, online mask will be different. We will be appreciate if we could collect some comments on which one to be used in next version. Regards, Michael Wangquoted
quoted
struct take_cpu_down_param { unsigned long mod; void *hcpu;@@ -246,7 +351,9 @@ struct take_cpu_down_param { static int __ref take_cpu_down(void *_param) { struct take_cpu_down_param *param = _param; - int err; + int err, cpu = (long)(param->hcpu); +Like this please: int err; int cpu = (long)(param->hcpu);quoted
+ prepare_cpu_take_down(cpu); /* Ensure this CPU doesn't handle any more interrupts. */ err = __cpu_disable(); ...-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/