Re: All files are damaged after btrfs restore
From: Sebastian Roller <hidden>
Date: 2021-03-04 15:37:08
I don't know. The exact nature of the damage of a failing controller is adding a significant unknown component to it. If it was just a matter of not writing anything at all, then there'd be no problem. But it sounds like it wrote spurious or corrupt data, possibly into locations that weren't even supposed to be written to.
Unfortunately I cannot figure out exactly what happened. Logs end Friday night while the backup script was running -- which also includes a finalizing balancing of the device. Monday morning after some exchange of hardware the machine came up being unable to mount the device.
I think if the snapshot b-tree is ok, and the chunk b-tree is ok, then it should be possible to recover the data correctly without needing any other tree. I'm not sure if that's how btrfs restore already works. Kernel 5.11 has a new feature, mount -o ro,rescue=all that is more tolerant of mounting when there are various kinds of problems. But there's another thread where a failed controller is thwarting recovery, and that code is being looked at for further enhancement. https://lore.kernel.org/linux-btrfs/CAEg-Je-DJW3saYKA2OBLwgyLU6j0JOF7NzXzECi0HJ5hft_5=A@mail.gmail.com/ (local)
OK -- I now had the chance to temporarily switch to 5.11.2. Output looks cleaner, but the error stays the same. root@hikitty:/mnt$ mount -o ro,rescue=all /dev/sdi1 hist/ [ 3937.815083] BTRFS info (device sdi1): enabling all of the rescue options [ 3937.815090] BTRFS info (device sdi1): ignoring data csums [ 3937.815093] BTRFS info (device sdi1): ignoring bad roots [ 3937.815095] BTRFS info (device sdi1): disabling log replay at mount time [ 3937.815098] BTRFS info (device sdi1): disk space caching is enabled [ 3937.815100] BTRFS info (device sdi1): has skinny extents [ 3938.903454] BTRFS error (device sdi1): bad tree block start, want 122583416078336 have 0 [ 3938.994662] BTRFS error (device sdi1): bad tree block start, want 99593231630336 have 0 [ 3939.201321] BTRFS error (device sdi1): bad tree block start, want 124762809384960 have 0 [ 3939.221395] BTRFS error (device sdi1): bad tree block start, want 124762809384960 have 0 [ 3939.221476] BTRFS error (device sdi1): failed to read block groups: -5 [ 3939.268928] BTRFS error (device sdi1): open_ctree failed I still hope that there might be some error in the fs created by the crash, which can be resolved instead of real damage to all the data in the FS trees. I used a lot of snapshots and deduplication on that device, so that I expect some damage by a hardware error. But I find it hard to believe that every file got damaged. Sebastian