Re: [PATCH v5 01/14] md: don't ignore suspended array in md_check_recovery()
From: Yu Kuai <hidden>
Date: 2024-02-18 01:15:04
Also in:
dm-devel, lkml
Hi, 在 2024/02/16 14:58, Xiao Ni 写道:
On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai [off-list ref] wrote:quoted
From: Yu Kuai <redacted> mddev_suspend() never stop sync_thread, hence it doesn't make sense to ignore suspended array in md_check_recovery(), which might cause sync_thread can't be unregistered. After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following hang can be triggered by test shell/integrity-caching.sh:Hi Kuai After applying this patch, it's still stuck at mddev_suspend. Maybe the deadlock can be fixed by other patches from the patch set. But this patch can't fix this issue. If so, the comment is not right.
This patch alone can't fix the problem that mddev_suspend() can stuck thoroughly, patches 1-4 will all be needed. Thanks, Kuai
quoted
1) suspend the array: raid_postsuspend mddev_suspend 2) stop the array: raid_dtr md_stop __md_stop_writes stop_sync_thread set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_wakeup_thread_directly(mddev->sync_thread); wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) 3) sync thread done: md_do_sync set_bit(MD_RECOVERY_DONE, &mddev->recovery); md_wakeup_thread(mddev->thread); 4) daemon thread can't unregister sync thread: md_check_recovery if (mddev->suspended) return; -> return directly md_read_sync_thread clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); -> MD_RECOVERY_RUNNING can't be cleared, hence step 2 hang;I add some debug logs when stopping dmraid with lvremove command. The step you mentioned are sequential but not async. The process is : dev_remove->dm_destroy->__dm_destroy->dm_table_postsuspend_targets(raid_postsuspend) -> dm_table_destroy(raid_dtr). It looks like mddev_suspend is waiting for active_io to be zero. Best Regards Xiaoquoted
This problem is not just related to dm-raid, fix it by ignoring suspended array in md_check_recovery(). And follow up patches will improve dm-raid better to frozen sync thread during suspend. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Closes: https://lore.kernel.org/all/8fb335e-6d2c-dbb5-d7-ded8db5145a@redhat.com/ (local) Fixes: 68866e425be2 ("MD: no sync IO while suspended") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Signed-off-by: Yu Kuai <redacted> --- drivers/md/md.c | 3 --- 1 file changed, 3 deletions(-)diff --git a/drivers/md/md.c b/drivers/md/md.c index 2266358d8074..07b80278eaa5 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c@@ -9469,9 +9469,6 @@ static void md_start_sync(struct work_struct *ws) */ void md_check_recovery(struct mddev *mddev) { - if (READ_ONCE(mddev->suspended)) - return; - if (mddev->bitmap) md_bitmap_daemon_work(mddev); --2.39.2.