Thread (34 messages) 34 messages, 5 authors, 2024-01-27

Re: [PATCH 2/7] md: fix a race condition when stopping the sync thread

From: Yu Kuai <hidden>
Date: 2024-01-18 13:20:16
Also in: dm-devel

Hi,

在 2024/01/18 21:07, Mikulas Patocka 写道:

On Thu, 18 Jan 2024, Yu Kuai wrote:
quoted
Hi,

在 2024/01/18 2:18, Mikulas Patocka 写道:
quoted
Note that md_wakeup_thread_directly is racy - it will do nothing if the
thread is already running or it may cause spurious wake-up if the thread
is blocked in another subsystem.
No, as the comment said, md_wakeup_thread_directly() is just to prevent
that md_wakeup_thread() can't wake up md_do_sync() if it's waiting for
metadata update.
Yes - but what happens if you wake up the thread just a few instructions
before it is going to sleep for metadata update? wake_up_process does
nothing on a running process and the thread proceeds with waiting. This is
what I thought could happen when I was making the patch.
Please notice that in the orginal code md_wakeup_thread_directly() is
used for sync_thread, and md_wakeup_thread() should be used for
*mddev->thread* (mddev_unlock always do that) to clear
MD_RECOVERY_RUNNING.

By the way, the root cause that MD_RECOVERY_RUNNING is not cleared is
that mddev_suspend() never stop sync_thread at all, while
md_check_recovery() won't do anything when mddev is suspended.

Before:
1. suspend
2. call md_reap_sync_thread() directly to unregister sync_thread
     -> notice that this is not safe.
3. resume

Now:
1. suspend
2. call stop_sync_thread() to unregister sync_thread interrupt
md_do_sync() and wait for md_check_recovery() to clear
MD_RECOVERY_RUNNING.
    -> which will never happen now;
3. resume

I fixed this locally and the test integrity-caching.sh passed in my VM.

Thanks,
Kuai
Mikulas
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help