Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat <hidden>
Date: 2012-12-13 15:27:45
Also in:
lkml
On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
On 12/13/2012 12:18 AM, Oleg Nesterov wrote:quoted
On 12/13, Srivatsa S. Bhat wrote:quoted
On 12/12/2012 11:32 PM, Oleg Nesterov wrote:quoted
And _perhaps_ get_ can avoid it too? I didn't really try to think, probably this is not right, but can't something like this work? #define XXXX (1 << 16) #define MASK (XXXX -1) void get_online_cpus_atomic(void) { preempt_disable(); // only for writer __this_cpu_add(reader_percpu_refcnt, XXXX); if (__this_cpu_read(reader_percpu_refcnt) & MASK) { __this_cpu_inc(reader_percpu_refcnt); } else { smp_wmb(); if (writer_active()) { ... } } __this_cpu_dec(reader_percpu_refcnt, XXXX); }Sorry, may be I'm too blind to see, but I didn't understand the logic of how the mask helps us avoid disabling interrupts..Why do we need cli/sti at all? We should prevent the following race: - the writer already holds hotplug_rwlock, so get_ must not succeed. - the new reader comes, it increments reader_percpu_refcnt, but before it checks writer_active() ... - irq handler does get_online_cpus_atomic() and sees reader_nested_percpu() == T, so it simply increments reader_percpu_refcnt and succeeds. OTOH, why do we need to increment reader_percpu_refcnt the counter in advance? To ensure that either we see writer_active() or the writer should see reader_percpu_refcnt != 0 (and that is why they should write/read in reverse order). The code above tries to avoid this race using the lower 16 bits as a "nested-counter", and the upper bits to avoid the race with the writer. // only for writer __this_cpu_add(reader_percpu_refcnt, XXXX); If irq comes and does get_online_cpus_atomic(), it won't be confused by __this_cpu_add(XXXX), it will check the lower bits and switch to the "slow path".This is a very clever scheme indeed! :-) Thanks a lot for explaining it in detail.quoted
But once again, so far I didn't really try to think. It is quite possible I missed something.Even I don't spot anything wrong with it. But I'll give it some more thought..
Since an interrupt handler can also run get_online_cpus_atomic(), we cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt, right? To maintain the integrity of the update itself, we will have to use the this_cpu_* variant, which basically plays spoil-sport on this whole scheme... :-( But still, this scheme is better, because the reader doesn't have to spin on the read_lock() with interrupts disabled. That way, interrupt handlers that are not hotplug readers can continue to run on this CPU while taking another CPU offline. Regards, Srivatsa S. Bhat