Re: Synchronous vs asynchonous mdadm operations
From: Chris Webb <hidden>
Date: 2008-12-04 10:59:53
Chris Webb [off-list ref] writes: [Re: mdadm --stop being potentially asynchronous]
The reason for the question is that I'm seeing occasional cases of arrays which won't reassemble following such an operation. dmesg alleges there is an invalid superblock for all of the six slots which were originally part of the array.
I tracked this one down to my scripts, which were failing to adjust the available space on the rdevs in a particularly rare case. However, I'm still wondering about the best way to do a fail/remove combination, given that fail appears to be asynchronous. The shell fragment I give below seems way over the top, but I can't see any simpler route....
I notice that some mdadm operations appear to be asynchronous. For instance,
mdadm --fail /dev/md/shelf.51000 /dev/mapper/slot.51000.1
mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1
will always fail at the --remove stage with
mdadm: hot remove failed for /dev/mapper/slot.51000.1: Device or resource busy
whereas adding a short sleep in between will make it successful.
Is there a 'standard' way to wait for this operation to complete or to
perform both steps in one go, other than something horrible like:
mdadm --fail /dev/md/shelf.51000 /dev/mapper/slot.51000.1
MD=$((`stat -c '%#T' -L /dev/md/shelf.51000`))
MAJOR=$((`stat -c '%#t' -L /dev/mapper/slot.51000.1`))
MINOR=$((`stat -c '%#T' -L /dev/mapper/slot.51000.1`))
for RD in /sys/block/md$MD/md/rd*; do
[ -f $RD/block/dev ] || continue
[ "`<$RD/block/dev`" = "$MAJOR:$MINOR" ] || continue
while [ "< $RD/state" != "faulty ]; do sleep 0.1; done
done
mdadm --remove /dev/md/shelf.51000 /dev/mapper/slot.51000.1Cheers, Chris.