Re: [mdadm git pull] support for removed disks / imsm updates

From: Dan Williams <hidden>
Date: 2009-03-04 23:59:27

On Wed, Mar 4, 2009 at 3:41 PM, Neil Brown [off-list ref] wrote:

On Friday February 27, dan.j.williams@intel.com wrote:

quoted

2/ Support for handling removed disks as currently all container
manipulations fail once a live disk is hot-unplugged.

So this is when md thinks the device is in the array, but the device
has actually been removed so with the block/dev file is missing or
empty, or the status is not 'online'..

But we only check for that if mdmon is running.  For some reason that
seems odd, but I'm not really sure.
Why do we want to treat this case differently depending on whether
mdmon is running or not?

The thinking, dubious or otherwise, is that if mdmon is not running
then the administrator is in charge of managing the container, and
would want to know about these errors.  I could not convince myself
that we *always* wanted to ignore missing disks here... so I erred
conservative.

However, we have already found another location where SKIP_GONE_DEVS
is needed, so part of me wonders about just making it the default?

quoted

3/ An initial mdmon man page
4/ imsm auto layout support
5/ Updates to --incremental in pursuit of assembling external metadata
arrays in the initramfs via udev events

Thanks.

Most look good.
My attention was caught by Create: wait_for container creation.

I vaguely remember trying that and it didn't work.  Something about
the md array not being in the right sort of state for udev to create a
device, or something...  But I expect you have tested it so maybe I'm
remembering something else.

It corrected a test script failure here fwiw, but will keep an eye out
for container creation deadlocks.

quoted

The one "fix" that is missing from this update is to teach mdmon to kick
"non-fresh" drives similar to what the kernel does at initial assembly.
I dropped the attempt after realizing I would need to take an O_EXCL
open on the container in an awkward place.  I guess it is not necessary,
but it is a quirk of containers that known failed drives can be allowed
back into the container.

I always thought it was a slightly odd quirk that if you had an array
with failed drives, then stopped and restarted the array, those failed
drives would no longer be there.
My feeling is that it doesn't matter a great deal one way or the
other.  The important thing is that when mdadm describes the state of
an array, it describes it in a way that doesn't confuse people (an
area in which v1.x metadata lets us down at the moment).

Ok, that clarifies things...

[..]

For now, all these patches have been pulled and pushed to neil.brown.name/mdadm

Thanks!

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help