Re: RAID-5 implementation questions
From: Phil Karn <hidden>
Date: 2010-12-03 12:02:23
On 12/3/10 2:02 AM, Mikael Abrahamsson wrote:
"--assume-clean".
Thanks.
Some raid implementations won't read/write to all drives, but might instead read the block being written to, and the parity block, then write the new block and recalculate the parity, thus not read/writing to all blocks. If this is the case, if the parity is wrong, it'll still be wrong after the operation, thus you don't have any redundancy.
Good point. That had occurred to me too but I didn't know if Linux did that. I can see how one might dynamically pick one way or the other depending on how much of the stripe is already in the buffer cache.
Doing a rebuild when creating the array is something I'd only skip if I was doing lab work, never in production. I use raid for redundancy, thus I want to make sure everything is ok and it doesn't matter to me if it takes half a day.
I hear you. But I think an important special case is when you're initially loading a new RAID-5 array from an existing (typically smaller) file system that will then be replaced by the new array. Why not let the new array work something like a RAID-0, leaving the parity blocks unwritten until you're finished loading the array? Then pass through the array writing all the parity blocks with the final data. If a drive fails in the new array before you're done, you still have all your original data; you haven't lost anything. Ultimately, RAID-5 in software is always going to be at least somewhat vulnerable because of the lack of an atomic (all or none) committed write of all the blocks in a stripe. This might silently corrupt an old, stable file in a way that you won't notice until a drive fails and you don't have the redundancy you thought you had to reconstruct it. can accept losing whatever files I was writing at the time of a crash, but silent corruption of an old and stable file seems far more insidious. I do periodically run checkarray to ensure that the parities are consistent, but this takes a long time and seems inelegant somehow. Maybe we need software ECC on all data so that one doesn't have to rely on the drive itself to detect errors. Thanks, Phil