Re: [PATCH v3 1/2] md/cluster: reshape should returns error when remote doing resyncing job
From: Song Liu <song@kernel.org>
Date: 2020-11-16 09:37:15
On Sat, Nov 14, 2020 at 8:30 PM Zhao Heming [off-list ref] wrote:
[...]
Signed-off-by: Zhao Heming <redacted>
The fix makes sense to me. But I really hope we can improve the commit log. I have made some changes to it with a couple TODOs for you (see below). Please read it, fill the TODOs, and revise 2/2. Thanks, Song md/cluster: block reshape with remote resync job Reshape request should be blocked with ongoing resync job. In cluster env, a node can start resync job even if the resync cmd isn't executed on it, e.g., user executes "mdadm --grow" on node A, sometimes node B will start resync job. However, current update_raid_disks() only check local recovery status, which is incomplete. As a result, we see (TODO describe observed issue). Fix this issue by blocking reshape request. When node executes "--grow" and detects ongoing resync, it should stop and report error to user. The following script reproduces the issue with (TODO: ???%) probability.
# on node1, node2 is the remote node.
mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh
ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh"
sleep 5
mdadm --manage --add /dev/md0 /dev/sdi
mdadm --wait /dev/md0
mdadm --grow --raid-devices=3 /dev/md0
mdadm /dev/md0 --fail /dev/sdg
mdadm /dev/md0 --remove /dev/sdg
mdadm --grow --raid-devices=2 /dev/md0
Cc: <redacted> Signed-off-by: Zhao Heming <redacted>
quoted hunk ↗ jump to hunk
--- drivers/md/md.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)diff --git a/drivers/md/md.c b/drivers/md/md.c index 98bac4f304ae..74280e353b8f 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c
[...]