Thread (6 messages) 6 messages, 6 authors, 2009-10-13

Re: Random bit flips - better data integrity needed [Was: Re: mismatch_count != 0 on multiple hosts]

From: Bill Davidsen <hidden>
Date: 2009-10-13 21:45:26

Matthias Urlichs wrote:
On Sat, 19 Sep 2009 12:10:34 -0400, Greg Freemyer wrote:

  
quoted
Specifically you could steal the second parity stripe from a raid 6
setup and replace it with this end-to-end data integrity checksum / crc.
    
If you're willing to add that kind of overhead, simply read all of the 
RAID6 stripes into memory and check whether they're consistent.

If not, it's easy to decide (for RAID6) whether the data or the parity is 
wrong: simply check both P and Q. If only one is broken, fix it. If both 
are, correct the data according to P and check if Q is now correct. If 
so, fix it. Otherwise the only thing you can do is to fail the whole 
array, and to alert the operator that they have major hardware issues. :-/

For RAID45, you can do the same, except that there's no way to fix any 
problems since you don't know whether data or parity is right. As the 
error may have crept in upon writing, rereading is of limited use.

For RAID1 (and maybe even multipath), the same idea applies; add majority 
rule when you have more than two disks.

Adding this kind of checking to the RAID456 driver should be rather easy 
for somebody who knows its internals. Its effect on read throughput is 
anyone's guess, of course.
  
To do this right requires forcing the data to the platter, then reading 
it back (from the platter, not cache) and checking it. Preferably 
reading with ECC off to catch marginal data. In the 60's there were 
drives with read-after-write heads, but the data density was so low you 
could sprinkle oxide on the platter and see data patterns. I can't see 
doing it that way with "heads" any more, but when solid state becomes 
more mainstream it becomes possible with useful transfer rates.

I have the feeling that someone had a patch to do that with a loopback 
mount, but I can't find a pointer.

-- 
Bill Davidsen [off-list ref]
  Unintended results are the well-earned reward for incompetence.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help