Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2020-11-28
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2020-11-30
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2020-12-01
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2020-12-02
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2020-12-03
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2020-12-03
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2020-12-21
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-01-19
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2021-01-20
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-01-23
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-01-26
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-01-25
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2021-01-26
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-01-26
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2021-01-26
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-01-26
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2021-01-26
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-01-26
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2021-02-02
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-02-08
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2021-02-08
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-02-08
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2021-02-09
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2021-02-09
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Marc Smith <hidden> · 2023-03-14
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2023-03-14
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Marc Smith <hidden> · 2023-03-14
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Marc Smith <hidden> · 2023-03-16
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Song Liu <song@kernel.org> · 2023-03-29
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Dragan Stancevic <hidden> · 2023-08-22
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Yu Kuai <hidden> · 2023-08-23
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Dragan Stancevic <hidden> · 2023-08-23
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Yu Kuai <hidden> · 2023-08-24
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Dragan Stancevic <hidden> · 2023-08-28
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Yu Kuai <hidden> · 2023-08-30
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Yu Kuai <hidden> · 2023-09-05
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Dragan Stancevic <hidden> · 2023-09-05
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2023-09-13
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Dragan Stancevic <hidden> · 2023-09-13
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2023-09-14
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2023-09-17
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2023-09-24
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Yu Kuai <hidden> · 2023-09-25
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2023-09-25
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Yu Kuai <hidden> · 2023-09-25
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Yu Kuai <hidden> · 2023-03-15
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Guoqing Jiang <hidden> · 2023-03-15
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Yu Kuai <hidden> · 2023-03-15
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition · Donald Buczek <hidden> · 2023-03-15

From: Guoqing Jiang <hidden>
Date: 2021-01-26 14:10:20
Also in: lkml


On 1/26/21 13:58, Donald Buczek wrote:

quoted

Hmm, how about wake the waiter up in the while loop of raid5d?

@@ -6520,6 +6532,11 @@ static void raid5d(struct md_thread *thread)

                         md_check_recovery(mddev);
                         spin_lock_irq(&conf->device_lock);
                 }
+
+               if ((atomic_read(&conf->active_stripes)
+                    < (conf->max_nr_stripes * 3 / 4) ||
+                    (test_bit(MD_RECOVERY_INTR, &mddev->recovery))))
+                       wake_up(&conf->wait_for_stripe);
         }
         pr_debug("%d stripes handled\n", handled);

Hmm... With this patch on top of your other one, we still have the basic 
symptoms (md3_raid6 busy looping), but the sync thread is now hanging at

     root@sloth:~# cat /proc/$(pgrep md3_resync)/stack
     [<0>] md_do_sync.cold+0x8ec/0x97c
     [<0>] md_thread+0xab/0x160
     [<0>] kthread+0x11b/0x140
     [<0>] ret_from_fork+0x22/0x30

instead, which is 
https://elixir.bootlin.com/linux/latest/source/drivers/md/md.c#L8963

Not sure why recovery_active is not zero, because it is set 0 before 
blk_start_plug, and raid5_sync_request returns 0 and skipped is also set 
to 1. Perhaps handle_stripe calls md_done_sync.

Could you double check the value of recovery_active? Or just don't wait 
if resync thread is interrupted.

wait_event(mddev->recovery_wait,
	   test_bit(MD_RECOVERY_INTR,&mddev->recovery) ||
	   !atomic_read(&mddev->recovery_active));

And, unlike before, "md: md3: data-check interrupted." from the pr_info 
two lines above appears in dmesg.

Yes, that is intentional since MD_RECOVERY_INTR is set by write idle.

Anyway, will try the script and investigate more about the issue.

Thanks,
Guoqing

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help