Re: Adventures in btrfs raid5 disk recovery
From: Chris Murphy <hidden>
Date: 2016-06-24 17:40:59
On Fri, Jun 24, 2016 at 4:16 AM, Hugo Mills [off-list ref] wrote:
On Fri, Jun 24, 2016 at 12:52:21PM +0300, Andrei Borzenkov wrote:
quoted
Yes, that is what I wrote below. But that means that RAID5 with one degraded disk won't be able to reconstruct data on this degraded disk because reconstructed extent content won't match checksum. Which kinda makes RAID5 pointless.Eh? How do you come to that conclusion? For data, say you have n-1 good devices, with n-1 blocks on them. Each block has a checksum in the metadata, so you can read that checksum, read the blocks, and verify that they're not damaged. From those n-1 known-good blocks (all data, or one parity and the rest data) you can reconstruct the remaining block. That reconstructed block won't be checked against the csum for the missing block -- it'll just be written and a new csum for it written with it.
The last sentence is hugely problematic. Parity doesn't appear to be either CoW'd or checksummed. If it is used for reconstruction and the reconstructed data isn't compared to the data's EXTENT_CSUM entry, but that entry is rather recomputed and written, that's just like blindly trusting the parity is correct and then authenticating it with a csum. It's not difficult to test. Corrupt one byte of parity. Yank a drive. Add a new one. Start a reconstruction with scrub or balance (or both to see if they differ) and find out what happens. What should happen is the reconstruct should work for everything except that one file. If it's reconstructed silently, it should contain visible corruption and we all collectively raise our eyebrows. -- Chris Murphy