Re: [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly.
From: Eli Stair <hidden>
Date: 2006-10-06 22:42:22
This patch has resolved the immediate issue I was having on 2.6.18 with
RAID10. Previous to this change, after removing a device from the array
(with mdadm --remove), physically pulling the device and
changing/re-inserting, the "Number" of the new device would be
incremented on top of the highest-present device in the array. Now, it
resumes its previous place.
Does this look to be 'correct' output for a 14-drive array, which dev 8
was failed/removed from then "add"'ed? I'm trying to determine why the
device doesn't get pulled back into the active configuration and
re-synced. Any comments?
Thanks!
/eli
For example, currently when device dm-8 is removed it shows up like this:
Number Major Minor RaidDevice State
0 253 0 0 active sync /dev/dm-0
1 253 1 1 active sync /dev/dm-1
2 253 2 2 active sync /dev/dm-2
3 253 3 3 active sync /dev/dm-3
4 253 4 4 active sync /dev/dm-4
5 253 5 5 active sync /dev/dm-5
6 253 6 6 active sync /dev/dm-6
7 253 7 7 active sync /dev/dm-7
8 0 0 8 removed
9 253 9 9 active sync /dev/dm-9
10 253 10 10 active sync /dev/dm-10
11 253 11 11 active sync /dev/dm-11
12 253 12 12 active sync /dev/dm-12
13 253 13 13 active sync /dev/dm-13
8 253 8 - spare /dev/dm-8
Previously however, it would come back with the "Number" as 14, not 8 as
it should. Shortly thereafter things got all out of whack, in addition
to just not working properly :) Now I've just got to figure out how to
get the re-introduced drive to participate in the array again like it
should.
Eli Stair wrote:I'm actually seeing similar behaviour on RAID10 (2.6.18), where after removing a drive from an array re-adding it sometimes results in it still being listed as a faulty-spare and not being "taken" for resync. In the same scenario, after swapping drives, doing a fail,remove, then an 'add' doesn't work, only a re-add will even get the drive listed by MDADM. What's the failure mode/symptoms that this patch is resolving? Is it possible this affects the RAID10 module/mode as well? If not, I'll start a new thread for that. I'm testing this patch to see if it does remedy the situation on RAID10, and will update after some significant testing. /eli NeilBrown wrote: > There is a nasty bug in md in 2.6.18 affecting at least raid1. > This fixes it (and has already been sent to stable@kernel.org). > > ### Comments for Changeset > > This fixes a bug introduced in 2.6.18. > > If a drive is added to a raid1 using older tools (mdadm-1.x or > raidtools) then it will be included in the array without any resync > happening. > > It has been submitted for 2.6.18.1. > > > Signed-off-by: Neil Brown [off-list ref] > > ### Diffstat output > ./drivers/md/md.c | 1 + > 1 file changed, 1 insertion(+) > > diff .prev/drivers/md/md.c ./drivers/md/md.c > --- .prev/drivers/md/md.c 2006-09-29 11:51:39.000000000 +1000 > +++ ./drivers/md/md.c 2006-10-05 16:40:51.000000000 +1000 > @@ -3849,6 +3849,7 @@ static int hot_add_disk(mddev_t * mddev, > } > clear_bit(In_sync, &rdev->flags); > rdev->desc_nr = -1; > + rdev->saved_raid_disk = -1; > err = bind_rdev_to_array(rdev, mddev); > if (err) > goto abort_export; > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html