Thread (18 messages) 18 messages, 4 authors, 2021-03-17

Re: All files are damaged after btrfs restore

From: Sebastian Roller <hidden>
Date: 2021-03-07 13:59:56

quoted
quoted
I don't know. The exact nature of the damage of a failing controller
is adding a significant unknown component to it. If it was just a
matter of not writing anything at all, then there'd be no problem. But
it sounds like it wrote spurious or corrupt data, possibly into
locations that weren't even supposed to be written to.
Unfortunately I cannot figure out exactly what happened. Logs end
Friday night while the backup script was running -- which also
includes a finalizing balancing of the device. Monday morning after
some exchange of hardware the machine came up being unable to mount
the device.
It's probably not discernible with logs anyway. What hardware does
when it goes berserk? It's chaos. And all file systems have write
order requirements. It's fine if at a certain point writes just
abruptly stop going to stable media. But if things are written out of
order, or if the hardware acknowledges critical metadata writes are
written but were actually dropped, it's bad. For all file systems.

quoted
OK -- I now had the chance to temporarily switch to 5.11.2. Output
looks cleaner, but the error stays the same.

root@hikitty:/mnt$ mount -o ro,rescue=all /dev/sdi1 hist/

[ 3937.815083] BTRFS info (device sdi1): enabling all of the rescue options
[ 3937.815090] BTRFS info (device sdi1): ignoring data csums
[ 3937.815093] BTRFS info (device sdi1): ignoring bad roots
[ 3937.815095] BTRFS info (device sdi1): disabling log replay at mount time
[ 3937.815098] BTRFS info (device sdi1): disk space caching is enabled
[ 3937.815100] BTRFS info (device sdi1): has skinny extents
[ 3938.903454] BTRFS error (device sdi1): bad tree block start, want
122583416078336 have 0
[ 3938.994662] BTRFS error (device sdi1): bad tree block start, want
99593231630336 have 0
[ 3939.201321] BTRFS error (device sdi1): bad tree block start, want
124762809384960 have 0
[ 3939.221395] BTRFS error (device sdi1): bad tree block start, want
124762809384960 have 0
[ 3939.221476] BTRFS error (device sdi1): failed to read block groups: -5
[ 3939.268928] BTRFS error (device sdi1): open_ctree failed
This looks like a super is expecting something that just isn't there
at all. If spurious behavior lasted only briefly during the hardware
failure, there's a chance of recovery. But this diminishes greatly if
the chaotic behavior was on-going for a while, many seconds or a few
minutes.

quoted
I still hope that there might be some error in the fs created by the
crash, which can be resolved instead of real damage to all the data in
the FS trees. I used a lot of snapshots and deduplication on that
device, so that I expect some damage by a hardware error. But I find
it hard to believe that every file got damaged.
Correct. They aren't actually damaged.

However, there's maybe 5-15 MiB of critical metadata on Btrfs, and if
it gets corrupt, the keys to the maze are lost. And it becomes
difficult, sometimes impossible, to "bootstrap" the file system. There
are backup entry points, but depending on the workload, they go stale
in seconds to a few minutes, and can be subject to being overwritten.

When 'btrfs restore' is doing partial recovery that ends up with a lot
of damage and holes tells me it's found stale parts of the file system
- it's on old rails so to speak, there's nothing available to tell it
that this portion of the tree is just old and not valid anymore (or
only partially valid), but also the restore code is designed to be
more tolerant of errors because otherwise it would just do nothing at
all.

I think if you're able to find the most recent root node for a
snapshot you want to restore, along with an intact chunk tree it
should be possible to get data out of that snapshot. The difficulty is
finding it, because it could be almost anywhere.
Would it make sense to just try  restore -t on any root I got with
btrfs-find-root with all of the snapshots?
OK so you said there's an original and backup file system, are they
both in equally bad shape, having been on the same controller? Are
they both btrfs?
The original / live file system was not btrfs but xfs. It is in a
different but equally bad state than the backup. We used bcache with a
write-back cache on a ssd which is now completely dead (does not get
recognized by any server anymore). To get the file system mounted I
ran xfs-repair. After that only 6% of the data was left and this is
nearly completely in lost+found. I'm now trying to sort these files by
type, since the data itself looks OK. Unfortunately the surviving
files seem to be the oldest ones.
What do you get for

btrfs insp dump-s -f /dev/sdXY

There might be a backup tree root in there that can be used with btrfs
restore -t
This is the output of ./btrfs  insp dump-s -f /dev/sdi1 run with
btrfs-progs 5.9.

./btrfs insp dump-s -f /dev/sdi1
superblock: bytenr=65536, device=/dev/sdi1
---------------------------------------------------------
csum_type               0 (crc32c)
csum_size               4
csum                    0x9e6891fc [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    56051c5f-fca6-4d54-a04e-1c1d8129fe56
metadata_uuid           56051c5f-fca6-4d54-a04e-1c1d8129fe56
label                   history
generation              825256
root                    122583415865344
sys_array_size          129
chunk_root_generation   825256
root_level              2
chunk_root              141944043454464
chunk_root_level        2
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             80013782134784
bytes_used              75176955760640
sectorsize              4096
nodesize                16384
leafsize (deprecated)   16384
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x169
                        ( MIXED_BACKREF |
                          COMPRESS_LZO |
                          BIG_METADATA |
                          EXTENDED_IREF |
                          SKINNY_METADATA )
cache_generation        825256
uuid_tree_generation    825256
dev_item.uuid           844e80b3-a8d5-4738-ac8a-4f54980556f6
dev_item.fsid           56051c5f-fca6-4d54-a04e-1c1d8129fe56 [match]
dev_item.type           0
dev_item.total_bytes    80013782134784
dev_item.bytes_used     75413317484544
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          2
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0
sys_chunk_array[2048]:
        item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 141944034426880)
                length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP
                io_align 65536 io_width 65536 sector_size 4096
                num_stripes 2 sub_stripes 1
                        stripe 0 devid 2 offset 2034741805056
                        dev_uuid 844e80b3-a8d5-4738-ac8a-4f54980556f6
                        stripe 1 devid 2 offset 2034775359488
                        dev_uuid 844e80b3-a8d5-4738-ac8a-4f54980556f6
backup_roots[4]:
        backup 0:
                backup_tree_root:       122583415865344 gen: 825256     level: 2
                backup_chunk_root:      141944043454464 gen: 825256     level: 2
                backup_extent_root:     122583418175488 gen: 825256     level: 3
                backup_fs_root:         58363985428480  gen: 789775     level: 0
                backup_dev_root:        122583415783424 gen: 825256     level: 1
                backup_csum_root:       122583553703936 gen: 825256     level: 3
                backup_total_bytes:     80013782134784
                backup_bytes_used:      75176955760640
                backup_num_devices:     1

        backup 1:
                backup_tree_root:       122343302234112 gen: 825253     level: 2
                backup_chunk_root:      141944034426880 gen: 825251     level: 2
                backup_extent_root:     122343333937152 gen: 825253     level: 3
                backup_fs_root:         58363985428480  gen: 789775     level: 0
                backup_dev_root:        122077274357760 gen: 825250     level: 1
                backup_csum_root:       122343380992000 gen: 825253     level: 3
                backup_total_bytes:     80013782134784
                backup_bytes_used:      75176955105280
                backup_num_devices:     1

        backup 2:
                backup_tree_root:       122343762804736 gen: 825254     level: 2
                backup_chunk_root:      141944034426880 gen: 825251     level: 2
                backup_extent_root:     122343762935808 gen: 825254     level: 3
                backup_fs_root:         58363985428480  gen: 789775     level: 0
                backup_dev_root:        122077274357760 gen: 825250     level: 1
                backup_csum_root:       122343764967424 gen: 825254     level: 3
                backup_total_bytes:     80013782134784
                backup_bytes_used:      75176955105280
                backup_num_devices:     1

        backup 3:
                backup_tree_root:       122574011269120 gen: 825255     level: 2
                backup_chunk_root:      141944034426880 gen: 825251     level: 2
                backup_extent_root:     122574011432960 gen: 825255     level: 3
                backup_fs_root:         58363985428480  gen: 789775     level: 0
                backup_dev_root:        122077274357760 gen: 825250     level: 1
                backup_csum_root:       122574014791680 gen: 825255     level: 3
                backup_total_bytes:     80013782134784
                backup_bytes_used:      75176955236352
                backup_num_devices:     1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help