Re: [PATCH v5 1/2] md/raid10: fix incorrect done of recovery
From: Li Nan <hidden>
Date: 2023-06-02 06:58:09
Also in:
lkml
在 2023/6/1 15:06, Paul Menzel 写道:
Dear Li, Thank you for your patch. Am 01.06.23 um 08:24 schrieb linan666@huaweicloud.com:quoted
From: Li Nan <redacted>Unfortunately, I do not understand your commit message summary “fix incorrect done of recovery”. Maybe: Do not add sparse disk when recovery aborts
"recovery fail" is better?
quoted
In raid10_sync_request(), if data cannot be read from any disk for recovery, it will go to 'giveup' and let 'chunks_skipped' + 1. After multiple 'giveup', when 'chunks_skipped >= geo.raid_disks', it will return 'max_sector', indicating that the recovery has been completed. However, the recovery is just aborted and the data remains inconsistent. Fix it by setting mirror->recovery_disabled, which will prevent the spare disk from being added to this mirror. The same issue also exists during resync, it will be fixed afterwards. Signed-off-by: Li Nan <redacted> --- drivers/md/raid10.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index d93d8cb2b620..3ba1516ea160 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev*mddev, sector_t sector_nr, int chunks_skipped = 0; sector_t chunk_mask = conf->geo.chunk_mask; int page_idx = 0; + int error_disk = -1; /* * Allow skipping a full rebuild for incremental assembly@@ -3386,7 +3387,20 @@ static sector_t raid10_sync_request(structmddev *mddev, sector_t sector_nr, return reshape_request(mddev, sector_nr, skipped); if (chunks_skipped >= conf->geo.raid_disks) { - /* if there has been nothing to do on any drive, + pr_err("md/raid10:%s: %s fail\n", mdname(mddev), + test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery"); + if (error_disk >= 0 && + !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { + /* + * recovery fail, set mirrors.recovory_disabled,recov*e*ryquoted
+ * device shouldn't be added to there. + */ + conf->mirrors[error_disk].recovery_disabled = + mddev->recovery_disabled; + return 0; + } + /* + * if there has been nothing to do on any drive, * then there is nothing to do at all..Just one dot/period at the end?
Thanks for your suggestion. I will change it in next version. -- Thanks, Nan