Thread (5 messages) 5 messages, 2 authors, 2014-05-28

Re: [patch 1/3]raid5: adjust order of some operations in handle_stripe

From: NeilBrown <hidden>
Date: 2014-05-28 04:54:35

On Wed, 28 May 2014 11:45:07 +0800 Shaohua Li [off-list ref] wrote:
On Wed, May 28, 2014 at 12:59:37PM +1000, NeilBrown wrote:
quoted
On Thu, 22 May 2014 19:24:31 +0800 Shaohua Li [off-list ref] wrote:
quoted
This is to revert ef5b7c69b7a1b8b8744a6168b6f. handle_stripe_clean_event()
handles finished stripes, which really should be the first thing to do. The
original changelog says checking reconstruct_state should be the first as
handle_stripe_clean_event can clear some dev->flags and impact checking
reconstruct_state code path. It's unclear to me why this happens, because I
thought written finish and reconstruct_state equals to *_result can't happen in
the same time.
"unclear to me" "I thought" are sufficient to justify a change, though they
are certainly sufficient to ask a question.

Are you asking a question or submitting a change?

You may well be correct that if reconstruct_state is not
reconstruct_state_idle, then handle_stripe_clean_event cannot possible be
called.  In that case, maybe we should change the code flow to make that more
obvious, but certainly the changelog comment should be clear about exactly
why.
I'm sorry, it's more like a question. I really didn't understand why we have
ef5b7c69b7a1b8b8744a6168b6f, so I'm not 100% sure about. It would be great you
can help share a hint.
It's a while ago and I don't remember, but I suspect that I added that patch
because handle_stripe_clean_event was about to change to clear R5_UPTODATE,
and this code which was previously *after* handle_stripe_clean_event tested
R5_UPTODATE (and could BUG if it wasn't set).

You may well be right that the two pieces of code cannot both run in the one
invocation of handle_stripe().  I haven't analysed the code closely to be
sure, but on casual reflection it seems likely.  However we always need to be
careful of races in unusual situations.

If that is correct, and if there are two (or more) different situations in
which handle_stripe runs, maybe one after IO has completed and one after
reconstruction has completed, and one when new devices have been added,
then there might be value in clearly delineating these so we don't bother
testing for cases that cannot happen.

If it is not correct, then your proposed change might be dangerous.

 
quoted
quoted
I also moved checking reconstruct_state code path after handle_stripe_dirtying.
If that code sets reconstruct_state to reconstruct_state_idle, the order change
will make us miss one handle_stripe_dirtying. But the stripe will be eventually
handled again when written is finished.
You haven't said here why this patch is a good thing, only why it isn't
obviously bad.  I really need some justification to make a change and you
haven't provided any, at least not in this changelog comment.
ok, I'll add more about this.
 
quoted
Maybe we need a completely different approach.
Instead of repeatedly shuffling code inside handle_stripe(), how about we put
all of handle_stripe inside a loop which runs as long as STRIPE_HANDLE is set
and sh->count == 1.
ie.

	if (test_and_set_bit_lock(STRIPE_ACTIVE, &sh->state)) {
		/* already being handled, ensure it gets handled
		 * again when current action finishes */
		set_bit(STRIPE_HANDLE, &sh->state);
		return;
	}

        do {
	        clear_bit(STRIPE_HANDLE, &sh->state);
                __handle_stripe(sh);
        } while (test_bit(STRIPE_HANDLE, &sh->state)
                 && atomic_read(&sh->count) == 1);
	clear_bit_unlock(STRIPE_ACTIVE, &sh->state);


where the rest of the current handle_stripe() goes in to __handle_stripe().

Would that address your performance concerns, or is there still too much
overhead?
Let me try. One issue here is we still have massive cache miss when checking
stripe/dev state. I suppose this doesn't help but data should prove.
That would be great - thanks.
If you can identify exactly where the cache misses are causing a problem, we
might be able to optimise around that.

NeilBrown

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help