Re: RAID-5 implementation questions

From: Phil Karn <hidden>
Date: 2010-12-03 12:02:23

On 12/3/10 2:02 AM, Mikael Abrahamsson wrote:

"--assume-clean".

Thanks.

Some raid implementations won't read/write to all drives, but might
instead read the block being written to, and the parity block, then
write the new block and recalculate the parity, thus not read/writing to
all blocks. If this is the case, if the parity is wrong, it'll still be
wrong after the operation, thus you don't have any redundancy.

Good point. That had occurred to me too but I didn't know if Linux did
that. I can see how one might dynamically pick one way or the other
depending on how much of the stripe is already in the buffer cache.

Doing a rebuild when creating the array is something I'd only skip if I
was doing lab work, never in production. I use raid for redundancy, thus
I want to make sure everything is ok and it doesn't matter to me if it
takes half a day.

I hear you. But I think an important special case is when you're
initially loading a new RAID-5 array from an existing (typically
smaller) file system that will then be replaced by the new array.

Why not let the new array work something like a RAID-0, leaving the
parity blocks unwritten until you're finished loading the array? Then
pass through the array writing all the parity blocks with the final
data. If a drive fails in the new array before you're done, you still
have all your original data; you haven't lost anything.

Ultimately, RAID-5 in software is always going to be at least somewhat
vulnerable because of the lack of an atomic (all or none) committed
write of all the blocks in a stripe. This might silently corrupt an old,
stable file in a way that you won't notice until a drive fails and you
don't have the redundancy you thought you had to reconstruct it. can
accept losing whatever files I was writing at the time of a crash, but
silent corruption of an old and stable file seems far more insidious. I
do periodically run checkarray to ensure that the parities are
consistent, but this takes a long time and seems inelegant somehow.
Maybe we need software ECC on all data so that one doesn't have to rely
on the drive itself to detect errors.

Thanks,

Phil

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help