Re: [PATCH 1/1] md/raid10: avoid deadlock on recovery.
From: Song Liu <hidden>
Date: 2020-07-22 06:18:19
On Tue, Jul 21, 2020 at 7:26 AM Nigel Croxon [off-list ref] wrote:
quoted
On Mar 3, 2020, at 1:14 PM, Vitaly Mayatskikh [off-list ref] wrote: When disk failure happens and the array has a spare drive, resync thread kicks in and starts to refill the spare. However it may get blocked by a retry thread that resubmits failed IO to a mirror and itself can get blocked on a barrier raised by the resync thread. Signed-off-by: Vitaly Mayatskikh <redacted> --- drivers/md/raid10.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index ec136e4..f1a8e26 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c@@ -980,6 +980,7 @@ static void wait_barrier(struct r10conf *conf){ spin_lock_irq(&conf->resync_lock); if (conf->barrier) { + struct bio_list *bio_list = current->bio_list; conf->nr_waiting++; /* Wait for the barrier to drop. * However if there are already pending@@ -994,9 +995,16 @@ static void wait_barrier(struct r10conf *conf) wait_event_lock_irq(conf->wait_barrier, !conf->barrier || (atomic_read(&conf->nr_pending) && - current->bio_list && - (!bio_list_empty(¤t->bio_list[0]) || - !bio_list_empty(¤t->bio_list[1]))), + bio_list && + (!bio_list_empty(&bio_list[0]) || + !bio_list_empty(&bio_list[1]))) || + /* move on if recovery thread is + * blocked by us + */ + (conf->mddev->thread->tsk == current && + test_bit(MD_RECOVERY_RUNNING, + &conf->mddev->recovery) && + conf->nr_queued > 0), conf->resync_lock); conf->nr_waiting--; if (!conf->nr_waiting)— 1.8.3.1Song, Have you had a chance to look at this patch? We would like to have it pulled in to the kernel.
I am sorry I missed this one. This looks good to me. Nigel, would you like to add your Reviewed-by, or Acked-by, or Tested-by tag? Thanks, Song