Re: md-raid paranoia mode?
From: NeilBrown <hidden>
Date: 2014-06-12 06:45:07
On Thu, 12 Jun 2014 12:28:14 +0600 Roman Mamedov [off-list ref] wrote:
On Thu, 12 Jun 2014 10:15:32 +0800 Brad Campbell [off-list ref] wrote:quoted
On 11/06/14 14:48, Bart Kus wrote:quoted
Hello, As far as I understand, md-raid relies on the underlying devices to inform it of IO errors before it'll seek redundant/parity data to fulfill the read request. I have, however, seen certain hard drives report successful reads while returning garbage data.If you have drives that return garbage as valid data then you have far greater problems than what you are suggesting will fix. So much so I suggest you document these instances and start banging a drum announcing them in a name and shame campaign. That sort of behavior from storage devices is never ok, and the manufacturer needs to know that.If your RAM can return garbage, that's not a justification for having ECC RAM. ECC RAM is a gimmick invented by weak conformist people. Instead, you should go and loudly scream at the manufacturer who sold you that RAM! Errors from RAM are never OK! RAM should always work perfectly! And if it doesn't, you have greater problems. We shall not tolerate this behavior! So go get a drum and start banging it as loudly as you can! Name and shame the manufacturer who sold you that RAM. Fight the power, brother!!!
Your screwdriver is leaking (*). Hard drives contain ECC. It should ensure undetected errors are an *extremely* rare event (more rare than bugs in the md code). If your ECC RAM started returning bad data without telling you, would you build a complex virtual memory system to load every byte from two different DIMMs into CPU registers and compare them before trusting them? I know that hard drives can return bad data. I've seen it happen. I don't think that trying to "fix" it in the md/raid layer is appropriate. File-systems and higher level data management systems (e.g. git) are much better placed to detect such errors than md/raid is. Supposedly btrfs will DTRT with your drives (though TRT is to RMA them, and I don't think btrfs has an RMA plugin yet).
You can probably tell just how sick I am of reasoning like yours. That's why we can't have nice things (md-side resiliency for the cases when you need/want it), and sadly Neil is of the same opinion as you.
In general, if you want nice things you need to pay for them. If you are willing to pay I suspect you can find someone who is willing to provide. NeilBrown (*)http://www.zazzle.com/a_bad_analogy_is_like_a_leaky_screwdriver_tshirts-235102919981826183
Attachments
- signature.asc [application/pgp-signature] 828 bytes