Thread (7 messages) 7 messages, 2 authors, 2023-06-02

Re: [PATCH v5 1/2] md/raid10: fix incorrect done of recovery

From: Li Nan <hidden>
Date: 2023-06-02 06:58:09
Also in: lkml


在 2023/6/1 15:06, Paul Menzel 写道:
Dear Li,


Thank you for your patch.

Am 01.06.23 um 08:24 schrieb linan666@huaweicloud.com:
quoted
From: Li Nan <redacted>
Unfortunately, I do not understand your commit message summary “fix 
incorrect done of recovery”. Maybe:

Do not add sparse disk when recovery aborts
"recovery fail" is better?
quoted
In raid10_sync_request(), if data cannot be read from any disk for
recovery, it will go to 'giveup' and let 'chunks_skipped' + 1. After
multiple 'giveup', when 'chunks_skipped >= geo.raid_disks', it will
return 'max_sector', indicating that the recovery has been completed.
However, the recovery is just aborted and the data remains inconsistent.

Fix it by setting mirror->recovery_disabled, which will prevent the spare
disk from being added to this mirror. The same issue also exists during
resync, it will be fixed afterwards.

Signed-off-by: Li Nan <redacted>
---
  drivers/md/raid10.c | 18 +++++++++++++++++-
  1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d93d8cb2b620..3ba1516ea160 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev 
*mddev, sector_t sector_nr,
      int chunks_skipped = 0;
      sector_t chunk_mask = conf->geo.chunk_mask;
      int page_idx = 0;
+    int error_disk = -1;
      /*
       * Allow skipping a full rebuild for incremental assembly
@@ -3386,7 +3387,20 @@ static sector_t raid10_sync_request(struct 
mddev *mddev, sector_t sector_nr,
          return reshape_request(mddev, sector_nr, skipped);
      if (chunks_skipped >= conf->geo.raid_disks) {
-        /* if there has been nothing to do on any drive,
+        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
+            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" 
: "recovery");
+        if (error_disk >= 0 &&
+            !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+            /*
+             * recovery fail, set mirrors.recovory_disabled,
recov*e*ry
quoted
+             * device shouldn't be added to there.
+             */
+            conf->mirrors[error_disk].recovery_disabled =
+                        mddev->recovery_disabled;
+            return 0;
+        }
+        /*
+         * if there has been nothing to do on any drive,
           * then there is nothing to do at all..
Just one dot/period at the end?
Thanks for your suggestion. I will change it in next version.

-- 
Thanks,
Nan
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help