Re: mismatch_cnt again
From: Neil Brown <hidden>
Date: 2009-11-16 01:37:47
On Sun, 15 Nov 2009 17:29:17 -0500 "Guy Watkins" [off-list ref] wrote:
I have been following this issue some, and I think this could be a cause for silent corruption on RAID5 and RAID6. I don't think this has been mentioned, if so, sorry.
RAID1/RAID10 are very different from RAID5/RAID6 RAID1/RAID10 can get 'mismatches' due to the particular behaviour of swap or filesystems. However this doesn't matter (the blocks that are inconsistent are of no interest to the filesystem). RAID5/RAID6 is careful not to allow any mismatches to creep in due to any particular filesystem or swap activity. This is because, as you say, those mismatches could be significant to the RAID algorithm even though they might be of no interest to the filesystem. mismatches can only occur in a RAID5/RAID6 due to a software bug in the md/raid code, or due to 'hardware errors' (including of course drive firmware errors etc). NeilBrown
If data blocks can be changed in memory before written to disk, even if the data blocks that were changed were never needed again from the disk, the other related blocks in the stripe are at risk. If the parity blocks are computed, then the 1 data block in memory is changed, then the blocks are written to disk, the parity would be wrong. If a disk fails and is re-added or replaced, the data block in that stripe will be computed using the changed block giving a now corrupt value. I am assuming the stripe has some data blocks that have needed data and at least 1 that was not needed, and that block that was not needed was changed before writing it to disk. And the disk that failed did not have the block that had been changed. I have a hard time conveying my thought in text. I hope you understand me. Thanks for reading.