Re: RAID1 removing failed disk returns EBUSY
From: NeilBrown <hidden>
Date: 2015-02-02 06:36:01
Subsystem:
software raid (multiple disks) support, the rest · Maintainers:
Song Liu, Yu Kuai, Linus Torvalds
On Thu, 29 Jan 2015 07:14:16 -0500 (EST) Xiao Ni [off-list ref] wrote:
----- Original Message -----quoted
From: "NeilBrown" <redacted> To: "Xiao Ni" <redacted> Cc: "Joe Lawrence" <redacted>, linux-raid@vger.kernel.org, "Bill Kuzeja" <redacted> Sent: Thursday, January 29, 2015 11:52:17 AM Subject: Re: RAID1 removing failed disk returns EBUSY On Sun, 18 Jan 2015 21:33:50 -0500 (EST) Xiao Ni [off-list ref] wrote:quoted
----- Original Message -----quoted
From: "Joe Lawrence" <redacted> To: "Xiao Ni" <redacted> Cc: "NeilBrown" <redacted>, linux-raid@vger.kernel.org, "Bill Kuzeja" [off-list ref] Sent: Friday, January 16, 2015 11:10:31 PM Subject: Re: RAID1 removing failed disk returns EBUSY On Fri, 16 Jan 2015 00:20:12 -0500 Xiao Ni [off-list ref] wrote:quoted
Hi Joe Thanks for reminding me. I didn't do that. Now it can remove successfully after writing "idle" to sync_action. I thought wrongly that the patch referenced in this mail is fixed for the problem.So it sounds like even with 3.18 and a new mdadm, this bug still persists? -- Joe --Hi Joe I'm a little confused now. Does the patch 45eaf45dfa4850df16bc2e8e7903d89021137f40 from linux-stable resolve the problem? My environment is: [root@dhcp-12-133 mdadm]# mdadm --version mdadm - v3.3.2-18-g93d3bd3 - 18th December 2014 (this is the newest upstream) [root@dhcp-12-133 mdadm]# uname -r 3.18.2 My steps are: [root@dhcp-12-133 mdadm]# lsblk sdb 8:16 0 931.5G 0 disk └─sdb1 8:17 0 5G 0 part sdc 8:32 0 186.3G 0 disk sdd 8:48 0 931.5G 0 disk └─sdd1 8:49 0 5G 0 part [root@dhcp-12-133 mdadm]# mdadm -CR /dev/md0 -l1 -n2 /dev/sdb1 /dev/sdd1 --assume-clean mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. Then I unplug the disk. [root@dhcp-12-133 mdadm]# lsblk sdc 8:32 0 186.3G 0 disk sdd 8:48 0 931.5G 0 disk └─sdd1 8:49 0 5G 0 part └─md0 9:0 0 5G 0 raid1 [root@dhcp-12-133 mdadm]# echo faulty > /sys/block/md0/md/dev-sdb1/state [root@dhcp-12-133 mdadm]# echo remove > /sys/block/md0/md/dev-sdb1/state -bash: echo: write error: Device or resource busy [root@dhcp-12-133 mdadm]# echo idle > /sys/block/md0/md/sync_action [root@dhcp-12-133 mdadm]# echo remove > /sys/block/md0/md/dev-sdb1/stateI cannot reproduce this - using linux 3.18.2. I'd be surprised if mdadm version affects things.Hi Neil I'm very curious, because it can reproduce in my machine 100%.quoted
This error (Device or resoource busy) implies that rdev->raid_disk is >= 0 (tested in state_store()). ->raid_disk is set to -1 by remove_and_add_spares() providing: 1/ it isn't Blocked (which is very unlikely) 2/ hot_remove_disk succeeds, which it will if nr_pending is zero, and 3/ nr_pending is zero.I remember I have tired to check those reasons. But it's really is the reason 1 which is very unlikely. I add some code in the function array_state_show array_state_show(struct mddev *mddev, char *page) { enum array_state st = inactive; struct md_rdev *rdev; rdev_for_each_rcu(rdev, mddev) { printk(KERN_ALERT "search for %s\n", rdev->bdev->bd_disk->disk_name); if (test_bit(Blocked, &rdev->flags)) printk(KERN_ALERT "rdev is Blocked\n"); else printk(KERN_ALERT "rdev is not Blocked\n"); } When I echo 1 > /sys/block/sdc/device/delete, then I ran command: [root@dhcp-12-133 md]# cat /sys/block/md0/md/array_state read-auto
^^^^^^^^^ I think that is half the explanation. You must have the md_mod.start_ro parameter set to '1'.
[root@dhcp-12-133 md]# dmesg [ 2679.559185] search for sdc [ 2679.559189] rdev is Blocked [ 2679.559190] search for sdb [ 2679.559190] rdev is not Blocked So sdc is Blocked
and that is the other half - thanks. (yes, I was wrong. Sometimes it is easier than being right, but still yields results). When a device fails, it is Blocked until the metadata is updated to record the failure. This ensures that no writes succeed without writing to that device, until we a certain that no read will try reading from that device, even after a crash/restart. Blocked is cleared after the metadata is written, but read-auto (and read-only) devices never write out their metadata. So blocked doesn't get cleared. When you "echo idle > .../sync_action" one of the side effects is to with from 'read-auto' to fully active. This allows the metadata to be written, Blocked to be cleared, and the device to be removed. If you echo none > /sys/block/md0/md/dev-sdc/slot first, then the remove will work. We could possibly fix it with something like the following, but I'm not sure I like it. There is no guarantee that I can see which would ensure the superblock got updated before the first write if the array switch to read/write. NeilBrown
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9233c71138f1..b3d1e8e5e067 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c@@ -7528,7 +7528,7 @@ static int remove_and_add_spares(struct mddev *mddev, rdev_for_each(rdev, mddev) if ((this == NULL || rdev == this) && rdev->raid_disk >= 0 && - !test_bit(Blocked, &rdev->flags) && + (!test_bit(Blocked, &rdev->flags) || mddev->ro) && (test_bit(Faulty, &rdev->flags) || ! test_bit(In_sync, &rdev->flags)) && atomic_read(&rdev->nr_pending)==0) {
Attachments
- (unnamed) [application/pgp-signature] 811 bytes