Thread (4 messages) 4 messages, 2 authors, 2006-10-10

Re: [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly.

From: Eli Stair <hidden>
Date: 2006-10-06 22:42:22

This patch has resolved the immediate issue I was having on 2.6.18 with 
RAID10.  Previous to this change, after removing a device from the array 
(with mdadm --remove), physically pulling the device and 
changing/re-inserting, the "Number" of the new device would be 
incremented on top of the highest-present device in the array.  Now, it 
resumes its previous place.

Does this look to be 'correct' output for a 14-drive array, which dev 8 
was failed/removed from then "add"'ed?  I'm trying to determine why the 
device doesn't get pulled back into the active configuration and 
re-synced.  Any comments?

Thanks!

/eli

For example, currently when device dm-8 is removed it shows up like this:



     Number   Major   Minor   RaidDevice State
        0     253        0        0      active sync   /dev/dm-0
        1     253        1        1      active sync   /dev/dm-1
        2     253        2        2      active sync   /dev/dm-2
        3     253        3        3      active sync   /dev/dm-3
        4     253        4        4      active sync   /dev/dm-4
        5     253        5        5      active sync   /dev/dm-5
        6     253        6        6      active sync   /dev/dm-6
        7     253        7        7      active sync   /dev/dm-7
        8       0        0        8      removed
        9     253        9        9      active sync   /dev/dm-9
       10     253       10       10      active sync   /dev/dm-10
       11     253       11       11      active sync   /dev/dm-11
       12     253       12       12      active sync   /dev/dm-12
       13     253       13       13      active sync   /dev/dm-13

        8     253        8        -      spare   /dev/dm-8


Previously however, it would come back with the "Number" as 14, not 8 as 
it should.  Shortly thereafter things got all out of whack, in addition 
to just not working properly :)  Now I've just got to figure out how to 
get the re-introduced drive to participate in the array again like it 
should.

Eli Stair wrote:

I'm actually seeing similar behaviour on RAID10 (2.6.18), where after
removing a drive from an array re-adding it sometimes results in it
still being listed as a faulty-spare and not being "taken" for resync.
In the same scenario, after swapping drives, doing a fail,remove, then
an 'add' doesn't work, only a re-add will even get the drive listed by
MDADM.


What's the failure mode/symptoms that this patch is resolving?

Is it possible this affects the RAID10 module/mode as well?  If not,
I'll start a new thread for that.  I'm testing this patch to see if it
does remedy the situation on RAID10, and will update after some
significant testing.


/eli








NeilBrown wrote:
 > There is a nasty bug in md in 2.6.18 affecting at least raid1.
 > This fixes it (and has already been sent to stable@kernel.org).
 >
 > ### Comments for Changeset
 >
 > This fixes a bug introduced in 2.6.18.
 >
 > If a drive is added to a raid1 using older tools (mdadm-1.x or
 > raidtools) then it will be included in the array without any resync
 > happening.
 >
 > It has been submitted for 2.6.18.1.
 >
 >
 > Signed-off-by: Neil Brown [off-list ref]
 >
 > ### Diffstat output
 >  ./drivers/md/md.c |    1 +
 >  1 file changed, 1 insertion(+)
 >
 > diff .prev/drivers/md/md.c ./drivers/md/md.c
 > --- .prev/drivers/md/md.c       2006-09-29 11:51:39.000000000 +1000
 > +++ ./drivers/md/md.c   2006-10-05 16:40:51.000000000 +1000
 > @@ -3849,6 +3849,7 @@ static int hot_add_disk(mddev_t * mddev,
 >         }
 >         clear_bit(In_sync, &rdev->flags);
 >         rdev->desc_nr = -1;
 > +       rdev->saved_raid_disk = -1;
 >         err = bind_rdev_to_array(rdev, mddev);
 >         if (err)
 >                 goto abort_export;
 > -
 > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
 > the body of a message to majordomo@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 >

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help