Thread (8 messages) 8 messages, 2 authors, 2021-06-22

Re: Recover from "couldn't read tree root"?

From: Chris Murphy <hidden>
Date: 2021-06-20 22:53:51

On Sun, Jun 20, 2021 at 3:31 PM Nathan Dehnel [off-list ref] wrote:
quoted
Was bcache in write back or write through mode?
Writeback.
Ok that's bad in this configuration because it means all the writes go
to the SSD and could be there for minutes, hours, days, or longer.
That means it's even possible the current supers are only on the SSDs,
as well as other critical btrfs metadata.

My best guess now is to assume one of the drives is bad and spewing
garbage or zeros. And assemble the array degraded with just one SSD
drive, and see if you can mount. If not, then it's the other SSD you
need to assemble degraded. There's a way to set a drive manually as
faulty so it won't assemble; I also thought of using sysfs but on my
own system, /sys/block/nvme0n1/device/delete does not exist like it
does for SATA SSDs.

Next you have to wrestle with this dilemma. If you pick the bad SSD,
you don't want bcache flushing anything from it to your HDDs or it'll
just corrupt them, right? if you pick the good SSD, you actually do
want bcache to flush it all to the drives, so they're in a good state
and you can optionally decouple the SSD entirely so that you're left
with just the individual drives again.

I think you might want to use 'blockdev --setro' on all the block
devices, SSD and HDD, to prevent any changes. You might get some
complaints from bcache if it can't write to HDDs or even to the SSDs,
so that might look like you've picked the bad SSD. But the real test
is if you can mount the btrfs. Try that with 'mount -o
ro,nologreplay,usebackuproot' and if you can at least get that far and
do some basic navigation, that's probably the good SSD. If you still
get mount failure, it's probably the bad one.

If you get a successful ro mount, I'd take advantage of it and backup
anything important. Just get it out now. And then you can try it all
again with everything read write; but with the bad SSD still disabled
and md array assemble degraded with the good SSD; and see if you can
mount read-write again. You need to be read write at the block device
layer to get bcache to flush SSD state to the drives, which I think is
done by setting the mode to writethrough and then waiting until
bcache/state is clean. HDDs need to be writable but btrfs doesn't need
to be mounted for this.

The other possibility is that there some bad data on both SSDs, in
which case it fails and chances are the btrfs is toast.


-- 
Chris Murphy
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help