Re: raid5d hangs when stopping an array during reshape

From: Shaohua Li <shli@kernel.org>
Date: 2016-02-25 01:17:10

On Thu, Feb 25, 2016 at 11:31:04AM +1100, Neil Brown wrote:

On Thu, Feb 25 2016, Shaohua Li wrote:

quoted

As for the bug, write requests run in raid5d, mddev_suspend() waits for all IO,
which waits for the write requests. So this is a clear deadlock. I think we
should delete the check_reshape() in md_check_recovery(). If we change
layout/disks/chunk_size, check_reshape() is already called. If we start an
array, the .run() already handles new layout. There is no point
md_check_recovery() check_reshape() again.

Are you sure?
Did you look at the commit which added that code?
commit b4c4c7b8095298ff4ce20b40bf180ada070812d0

When there is an IO error, reshape (or resync or recovery) will abort
and then possibly be automatically restarted.

thanks pointing out this.

Without the check here a reshape might be attempted on an array which
has failed.  Not sure if that would be harmful, but it would certainly
be pointless.

But you are right that this is causing the problem.
Maybe we should keep track of the size of the 'scribble' arrays and only
call resize_chunks if the size needs to change?  Similar to what
resize_stripes does.

yep, this is my first solution, but think check_reshape() is useless here
later, apparently miss the restart case. I'll go this way.

It might also be good to put something like
  WARN_ON(current == mddev->thread->task);
in mddev_suspend() ... or whatever code would cause this sort of error
to trigger a warning early.

Sounds good.

Thanks,
Shaohua

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help