Re: [PATCH 1/1] md/raid10: avoid deadlock on recovery.

From: Song Liu <hidden>
Date: 2020-07-22 06:18:19

On Tue, Jul 21, 2020 at 7:26 AM Nigel Croxon [off-list ref] wrote:

quoted

On Mar 3, 2020, at 1:14 PM, Vitaly Mayatskikh [off-list ref] wrote:

When disk failure happens and the array has a spare drive, resync thread
kicks in and starts to refill the spare. However it may get blocked by
a retry thread that resubmits failed IO to a mirror and itself can get
blocked on a barrier raised by the resync thread.

Signed-off-by: Vitaly Mayatskikh <redacted>
---
drivers/md/raid10.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ec136e4..f1a8e26 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c

@@ -980,6 +980,7 @@ static void wait_barrier(struct r10conf *conf)

{
      spin_lock_irq(&conf->resync_lock);
      if (conf->barrier) {
+             struct bio_list *bio_list = current->bio_list;
              conf->nr_waiting++;
              /* Wait for the barrier to drop.
               * However if there are already pending

@@ -994,9 +995,16 @@ static void wait_barrier(struct r10conf *conf)
              wait_event_lock_irq(conf->wait_barrier,
                                  !conf->barrier ||
                                  (atomic_read(&conf->nr_pending) &&
-                                  current->bio_list &&
-                                  (!bio_list_empty(&current->bio_list[0]) ||
-                                   !bio_list_empty(&current->bio_list[1]))),
+                                  bio_list &&
+                                  (!bio_list_empty(&bio_list[0]) ||
+                                   !bio_list_empty(&bio_list[1]))) ||
+                                  /* move on if recovery thread is
+                                   * blocked by us
+                                   */
+                                  (conf->mddev->thread->tsk == current &&
+                                   test_bit(MD_RECOVERY_RUNNING,
+                                            &conf->mddev->recovery) &&
+                                   conf->nr_queued > 0),
                                  conf->resync_lock);
              conf->nr_waiting--;
              if (!conf->nr_waiting)

—
1.8.3.1

Song, Have you had a chance to look at this patch?
We would like to have it pulled in to the kernel.

I am sorry I missed this one. This looks good to me.

Nigel, would you like to add your Reviewed-by, or Acked-by, or Tested-by tag?

Thanks,
Song

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help