Re: btrfs cannot be mounted or checked

From: Qu Wenruo <hidden>
Date: 2021-07-11 12:00:19


On 2021/7/11 下午7:37, Forza wrote:


On 2021-07-11 10:59, Zhenyu Wu wrote:

quoted

Sorry for my disturbance.
After a dirty reboot because of a computer crash, my btrfs partition
cannot be mounted. The same thing happened before, but now `btrfs
rescue zero-log` cannot work.

$ uname -r
5.10.27-gentoo-x86_64
$ btrfs rescue zero-log /dev/sda2
Clearing log on /dev/sda2, previous log_root 0, level 0
$ mount /dev/sda2 /mnt/gentoo
mount: /mnt/gentoo: wrong fs type, bad option, bad superblock on
/dev/sda2, missing codepage or helper program, or other error.
$ btrfs check /dev/sda2
parent transid verify failed on 34308096 wanted 962175 found 961764
parent transid verify failed on 34308096 wanted 962175 found 961764
parent transid verify failed on 34308096 wanted 962175 found 961764
Ignoring transid failure
leaf parent key incorrect 34308096
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system
$ dmesg 2>&1|tee dmesg.txt
# see attachment

Like `mount -o ro,usebackuproot` cannot work, too.

Thanks for any help!


Hi!

Parent transid failed is hard to recover from, as mentioned on
https://btrfs.wiki.kernel.org/index.php/FAQ#How_do_I_recover_from_a_.22parent_transid_verify_failed.22_error.3F


I see you have "corrupt 5" sectors in dmesg. Is your disk healthy? You
can check with "smartctl -x /dev/sda" to determine the health.

One way of avoiding this error is to disable write-cache. Parent transid
failed can happen when the disk re-orders writes in its write cache
before flushing to disk. This violates barriers, but it is unfortately
common. If you have a crash, SATA bus reset or other issues, unwritten
content is lost. The problem here is the re-ordering. The superblock is
written out before other metadata (which is now lost due to the crash).

To be extra accurate, all filesysmtems have taken the re-order into
consideration.
Thus we have flush (or called barrier) command to force the disk to
write all its cache back to disk or at least non-volatile cache.

Combined with mandatory metadata CoW, it means, no matter what the disk
re-order or not, we should only see either the newer data after the
flush, or the older data before the flush.

But unfortunately, hardware is unreliable, sometimes even lies about its
flush command.
Thus it's possible some disks, especially some cheap RAID cards, tend to
just ignore such flush commands, thus leaves the data corrupted after a
power loss.

Thanks,
Qu

You disable write cache with "hdparm -W0 /dev/sda". It might be worth
adding this to a cron-job every 5 minutes or so, as the setting is not
persistent and can get reset if the disk looses power, goes to sleep, etc.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help