Re: Recover from "couldn't read tree root"?
From: Chris Murphy <hidden>
Date: 2021-06-20 21:20:00
On Sun, Jun 20, 2021 at 2:38 PM Nathan Dehnel [off-list ref] wrote:
A machine failed to boot, so I tried to mount its root partition from systemrescuecd, which failed: [ 5404.240019] BTRFS info (device bcache3): disk space caching is enabled [ 5404.240022] BTRFS info (device bcache3): has skinny extents [ 5404.243195] BTRFS error (device bcache3): parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 [ 5404.243279] BTRFS error (device bcache3): parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 [ 5404.243362] BTRFS error (device bcache3): parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 [ 5404.243432] BTRFS error (device bcache3): parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 [ 5404.243435] BTRFS warning (device bcache3): couldn't read tree root [ 5404.244114] BTRFS error (device bcache3): open_ctree failed
This is generally bad, and means some lower layer did something wrong, such as getting write order incorrect, i.e. failing to properly honor flush/fua. Recovery can be difficult and take a while. https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#parent_transid_verify_failed I suggest searching logs since the last time this file system was working, because the above error is indicating a problem that's already happened and what we need to know is what happened, if possible. Something like this: journalctl --since=-5d -k -o short-monotonic --no-hostname | grep "Linux version\| ata\|bcache\|Btrfs\|BTRFS\|] hd\| scsi\| sd\| sdhci\| mmc\| nvme\| usb\| vd"
btrfs rescue super-recover -v /dev/bcache0 returned this: parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 parent transid verify failed on 3004631449600 wanted 1420882 found 1420435 Ignoring transid failure ERROR: could not setup extent tree Failed to recover bad superblocks
OK something is really wrong if you're not able to see a single superblock on any of the bcache devices. Every member device has 3 super blocks, given the sizes you've provided. For there to not be a single one is a spectacular failure as if the bcache cache device isn't returning correct information for any of them. So I'm gonna guess a single shared SSD, which is a single point of failure, and it's spitting out garbage or zeros. But I'm not even close to a bcache expert so you might want to ask bcache developers how to figure out what state bcache is in and whether and how to safely decouple it from the backing drives so that you can engage in recovery attempts. If bcache mode is write through, there's a chance the backing drives have valid btrfs metadata, and it's just that on read the SSD is returning bogus information. -- Chris Murphy