Thread (5 messages) 5 messages, 4 authors, 2009-11-16

Re: mismatch_cnt again

From: Neil Brown <hidden>
Date: 2009-11-16 01:37:47

On Sun, 15 Nov 2009 17:29:17 -0500
"Guy Watkins" [off-list ref] wrote:
I have been following this issue some, and I think this could be a
cause for silent corruption on RAID5 and RAID6.  I don't think this
has been mentioned, if so, sorry.
RAID1/RAID10 are very different from RAID5/RAID6

RAID1/RAID10 can get 'mismatches' due to the particular behaviour
of swap or filesystems.  However this doesn't matter (the blocks that
are inconsistent are of no interest to the filesystem).

RAID5/RAID6 is careful not to allow any mismatches to creep in
due to any particular filesystem or swap activity.  This is because,
as you say, those mismatches could be significant to the RAID
algorithm even though they might be of no interest to the filesystem.

mismatches can only occur in a RAID5/RAID6 due to a software bug
in the md/raid code, or due to 'hardware errors' (including of course
drive firmware errors etc).

NeilBrown

If data blocks can be changed in memory before written to disk, even
if the data blocks that were changed were never needed again from the
disk, the other related blocks in the stripe are at risk.  If the
parity blocks are computed, then the 1 data block in memory is
changed, then the blocks are written to disk, the parity would be
wrong.  If a disk fails and is re-added or replaced, the data block
in that stripe will be computed using the changed block giving a now
corrupt value.  I am assuming the stripe has some data blocks that
have needed data and at least 1 that was not needed, and that block
that was not needed was changed before writing it to disk.  And the
disk that failed did not have the block that had been changed.

I have a hard time conveying my thought in text.  I hope you
understand me.

Thanks for reading.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help