Re: Recover from "couldn't read tree root"?
From: Nathan Dehnel <hidden>
Date: 2021-06-20 21:49:28
I suggest searching logs since the last time this file system was
working, because the above error is indicating a problem that's already happened and what we need to know is what happened, if possible. Something like this:
journalctl --since=-5d -k -o short-monotonic --no-hostname | grep
"Linux version\| ata\|bcache\|Btrfs\|BTRFS\|] hd\| scsi\| sd\| sdhci\| mmc\| nvme\| usb\| vd" Unfortunately I put my journal logs in a different subvolume so they wouldn't bloat my snapshots so they weren't included in my backups.
So I'm gonna guess a single shared SSD, which is a single point of failure, and
it's spitting out garbage or zeros. It's 2 SSDs in mdraid RAID10.
But I'm not even close to a bcache expert so you might want to ask bcache developers how to figure out
what state bcache is in and whether and how to safely decouple it from the backing drives so that you can engage in recovery attempts. They didn't respond the last couple of times I've asked a question on their irc or mailing list. On Sun, Jun 20, 2021 at 9:19 PM Chris Murphy [off-list ref] wrote:
On Sun, Jun 20, 2021 at 2:38 PM Nathan Dehnel [off-list ref] wrote:quoted
A machine failed to boot, so I tried to mount its root partition from systemrescuecd, which failed: [ 5404.240019] BTRFS info (device bcache3): disk space caching is enabled [ 5404.240022] BTRFS info (device bcache3): has skinny extents [ 5404.243195] BTRFS error (device bcache3): parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 [ 5404.243279] BTRFS error (device bcache3): parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 [ 5404.243362] BTRFS error (device bcache3): parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 [ 5404.243432] BTRFS error (device bcache3): parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 [ 5404.243435] BTRFS warning (device bcache3): couldn't read tree root [ 5404.244114] BTRFS error (device bcache3): open_ctree failedThis is generally bad, and means some lower layer did something wrong, such as getting write order incorrect, i.e. failing to properly honor flush/fua. Recovery can be difficult and take a while. https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#parent_transid_verify_failed I suggest searching logs since the last time this file system was working, because the above error is indicating a problem that's already happened and what we need to know is what happened, if possible. Something like this: journalctl --since=-5d -k -o short-monotonic --no-hostname | grep "Linux version\| ata\|bcache\|Btrfs\|BTRFS\|] hd\| scsi\| sd\| sdhci\| mmc\| nvme\| usb\| vd"quoted
btrfs rescue super-recover -v /dev/bcache0 returned this: parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 Ignoring transid failure ERROR: could not setup extent tree Failed to recover bad superblocksOK something is really wrong if you're not able to see a single superblock on any of the bcache devices. Every member device has 3 super blocks, given the sizes you've provided. For there to not be a single one is a spectacular failure as if the bcache cache device isn't returning correct information for any of them. So I'm gonna guess a single shared SSD, which is a single point of failure, and it's spitting out garbage or zeros. But I'm not even close to a bcache expert so you might want to ask bcache developers how to figure out what state bcache is in and whether and how to safely decouple it from the backing drives so that you can engage in recovery attempts. If bcache mode is write through, there's a chance the backing drives have valid btrfs metadata, and it's just that on read the SSD is returning bogus information. -- Chris Murphy