Thread (8 messages) 8 messages, 2 authors, 2021-06-22

Re: Recover from "couldn't read tree root"?

From: Chris Murphy <hidden>
Date: 2021-06-20 21:20:00

On Sun, Jun 20, 2021 at 2:38 PM Nathan Dehnel [off-list ref] wrote:
A machine failed to boot, so I tried to mount its root partition from
systemrescuecd, which failed:

[ 5404.240019] BTRFS info (device bcache3): disk space caching is enabled
[ 5404.240022] BTRFS info (device bcache3): has skinny extents
[ 5404.243195] BTRFS error (device bcache3): parent transid verify
failed on 3004631449600 wanted 1420882 found 1420435
[ 5404.243279] BTRFS error (device bcache3): parent transid verify
failed on 3004631449600 wanted 1420882 found 1420435
[ 5404.243362] BTRFS error (device bcache3): parent transid verify
failed on 3004631449600 wanted 1420882 found 1420435
[ 5404.243432] BTRFS error (device bcache3): parent transid verify
failed on 3004631449600 wanted 1420882 found 1420435
[ 5404.243435] BTRFS warning (device bcache3): couldn't read tree root
[ 5404.244114] BTRFS error (device bcache3): open_ctree failed
This is generally bad, and means some lower layer did something wrong,
such as getting write order incorrect, i.e. failing to properly honor
flush/fua. Recovery can be difficult and take a while.
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#parent_transid_verify_failed

I suggest searching logs since the last time this file system was
working, because the above error is indicating a problem that's
already happened and what we need to know is what happened, if
possible. Something like this:

journalctl --since=-5d -k -o short-monotonic --no-hostname | grep
"Linux version\| ata\|bcache\|Btrfs\|BTRFS\|] hd\| scsi\| sd\| sdhci\|
mmc\| nvme\| usb\| vd"


btrfs rescue super-recover -v /dev/bcache0 returned this:

parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
Ignoring transid failure
ERROR: could not setup extent tree
Failed to recover bad superblocks
OK something is really wrong if you're not able to see a single
superblock on any of the bcache devices. Every member device has  3
super blocks, given the sizes you've provided. For there to not be a
single one is a spectacular failure as if the bcache cache device
isn't returning correct information for any of them. So I'm gonna
guess a single shared SSD, which is a single point of failure, and
it's spitting out garbage or zeros. But I'm not even close to a bcache
expert so you might want to ask bcache developers how to figure out
what state bcache is in and whether and how to safely decouple it from
the backing drives so that you can engage in recovery attempts.

If bcache mode is write through, there's a chance the backing drives
have valid btrfs metadata, and it's just that on read the SSD is
returning bogus information.





-- 
Chris Murphy
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help