Thread (40 messages) 40 messages, 3 authors, 2012-12-24

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

From: Srivatsa S. Bhat <hidden>
Date: 2012-12-13 15:27:45
Also in: lkml

On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
On 12/13/2012 12:18 AM, Oleg Nesterov wrote:
quoted
On 12/13, Srivatsa S. Bhat wrote:
quoted
On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
quoted
And _perhaps_ get_ can avoid it too?

I didn't really try to think, probably this is not right, but can't
something like this work?

	#define XXXX	(1 << 16)
	#define MASK	(XXXX -1)

	void get_online_cpus_atomic(void)
	{
		preempt_disable();

		// only for writer
		__this_cpu_add(reader_percpu_refcnt, XXXX);

		if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
			__this_cpu_inc(reader_percpu_refcnt);
		} else {
			smp_wmb();
			if (writer_active()) {
				...
			}
		}

		__this_cpu_dec(reader_percpu_refcnt, XXXX);
	}
Sorry, may be I'm too blind to see, but I didn't understand the logic
of how the mask helps us avoid disabling interrupts..
Why do we need cli/sti at all? We should prevent the following race:

	- the writer already holds hotplug_rwlock, so get_ must not
	  succeed.

	- the new reader comes, it increments reader_percpu_refcnt,
	  but before it checks writer_active() ...

	- irq handler does get_online_cpus_atomic() and sees
	  reader_nested_percpu() == T, so it simply increments
	  reader_percpu_refcnt and succeeds.

OTOH, why do we need to increment reader_percpu_refcnt the counter
in advance? To ensure that either we see writer_active() or the
writer should see reader_percpu_refcnt != 0 (and that is why they
should write/read in reverse order).

The code above tries to avoid this race using the lower 16 bits
as a "nested-counter", and the upper bits to avoid the race with
the writer.

	// only for writer
	__this_cpu_add(reader_percpu_refcnt, XXXX);

If irq comes and does get_online_cpus_atomic(), it won't be confused
by __this_cpu_add(XXXX), it will check the lower bits and switch to
the "slow path".
This is a very clever scheme indeed! :-) Thanks a lot for explaining
it in detail.
quoted
But once again, so far I didn't really try to think. It is quite
possible I missed something.
Even I don't spot anything wrong with it. But I'll give it some more
thought..
Since an interrupt handler can also run get_online_cpus_atomic(), we
cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
right?

To maintain the integrity of the update itself, we will have to use the
this_cpu_* variant, which basically plays spoil-sport on this whole
scheme... :-(

But still, this scheme is better, because the reader doesn't have to spin
on the read_lock() with interrupts disabled. That way, interrupt handlers
that are not hotplug readers can continue to run on this CPU while taking
another CPU offline.

Regards,
Srivatsa S. Bhat
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help