Re: Recover from "couldn't read tree root"?
From: Chris Murphy <hidden>
Date: 2021-06-20 22:53:51
On Sun, Jun 20, 2021 at 3:31 PM Nathan Dehnel [off-list ref] wrote:
quoted
Was bcache in write back or write through mode?Writeback.
Ok that's bad in this configuration because it means all the writes go to the SSD and could be there for minutes, hours, days, or longer. That means it's even possible the current supers are only on the SSDs, as well as other critical btrfs metadata. My best guess now is to assume one of the drives is bad and spewing garbage or zeros. And assemble the array degraded with just one SSD drive, and see if you can mount. If not, then it's the other SSD you need to assemble degraded. There's a way to set a drive manually as faulty so it won't assemble; I also thought of using sysfs but on my own system, /sys/block/nvme0n1/device/delete does not exist like it does for SATA SSDs. Next you have to wrestle with this dilemma. If you pick the bad SSD, you don't want bcache flushing anything from it to your HDDs or it'll just corrupt them, right? if you pick the good SSD, you actually do want bcache to flush it all to the drives, so they're in a good state and you can optionally decouple the SSD entirely so that you're left with just the individual drives again. I think you might want to use 'blockdev --setro' on all the block devices, SSD and HDD, to prevent any changes. You might get some complaints from bcache if it can't write to HDDs or even to the SSDs, so that might look like you've picked the bad SSD. But the real test is if you can mount the btrfs. Try that with 'mount -o ro,nologreplay,usebackuproot' and if you can at least get that far and do some basic navigation, that's probably the good SSD. If you still get mount failure, it's probably the bad one. If you get a successful ro mount, I'd take advantage of it and backup anything important. Just get it out now. And then you can try it all again with everything read write; but with the bad SSD still disabled and md array assemble degraded with the good SSD; and see if you can mount read-write again. You need to be read write at the block device layer to get bcache to flush SSD state to the drives, which I think is done by setting the mode to writethrough and then waiting until bcache/state is clean. HDDs need to be writable but btrfs doesn't need to be mounted for this. The other possibility is that there some bad data on both SSDs, in which case it fails and chances are the btrfs is toast. -- Chris Murphy