Re: btrfs cannot be mounted or checked

From: Zhenyu Wu <hidden>
Date: 2021-07-14 08:49:30

sorry for late:(

I found <https://bbs.archlinux.org/viewtopic.php?id=233724> looks same
as my situation. But in my computer (boot from live usb) `btrfs check
--init-extent-tree` output a lot of non-ascii character (maybe because
ansi escape code mess the terminal)
after several days it outputs `7/7`and `killed`. The solution looks failed.

I'm sorry because my live usb don't have smartctl :(

$ hdparm -W0 /dev/sda
/dev/sda:
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)

But now the btrfs partition still cannot be mounted.

when I try to mount it with `usebackuproot`, it will output the same
error message. And dmesg will output

[250062.064785] BTRFS warning (device sda2): 'usebackuproot' is
deprecated, use 'rescue=usebackuproot' instead
[250062.064788] BTRFS info (device sda2): trying to use backup root at
mount time
[250062.064789] BTRFS info (device sda2): disk space caching is enabled
[250062.064790] BTRFS info (device sda2): has skinny extents
[250062.208403] BTRFS info (device sda2): bdev /dev/sda2 errs: wr 0,
rd 0, flush 0, corrupt 5, gen 0
[250062.277045] BTRFS critical (device sda2): corrupt leaf: root=2
block=273006592 slot=17 bg_start=1104150528 bg_len=1073741824, invalid
block group used, have 1073754112 expect [0, 1073741824)
[250062.277048] BTRFS error (device sda2): block=273006592 read time
tree block corruption detected
[250062.291924] BTRFS critical (device sda2): corrupt leaf: root=2
block=273006592 slot=17 bg_start=1104150528 bg_len=1073741824, invalid
block group used, have 1073754112 expect [0, 1073741824)
[250062.291927] BTRFS error (device sda2): block=273006592 read time
tree block corruption detected
[250062.291943] BTRFS error (device sda2): failed to read block groups: -5
[250062.292897] BTRFS error (device sda2): open_ctree failed

If don't usebackuproot, dmesg will output the same log except the first 2 lines.

Now btrfs check can check this partition:

$ btrfs check /dev/sda2 2>&1|tee check.txt
# see attachment

Does my disk have any hope to be rescued?
thanks!

On 7/11/21, Qu Wenruo [off-list ref] wrote:


On 2021/7/11 下午7:37, Forza wrote:

quoted


On 2021-07-11 10:59, Zhenyu Wu wrote:

quoted

Sorry for my disturbance.
After a dirty reboot because of a computer crash, my btrfs partition
cannot be mounted. The same thing happened before, but now `btrfs
rescue zero-log` cannot work.

$ uname -r
5.10.27-gentoo-x86_64
$ btrfs rescue zero-log /dev/sda2
Clearing log on /dev/sda2, previous log_root 0, level 0
$ mount /dev/sda2 /mnt/gentoo
mount: /mnt/gentoo: wrong fs type, bad option, bad superblock on
/dev/sda2, missing codepage or helper program, or other error.
$ btrfs check /dev/sda2
parent transid verify failed on 34308096 wanted 962175 found 961764
parent transid verify failed on 34308096 wanted 962175 found 961764
parent transid verify failed on 34308096 wanted 962175 found 961764
Ignoring transid failure
leaf parent key incorrect 34308096
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system
$ dmesg 2>&1|tee dmesg.txt
# see attachment

Like `mount -o ro,usebackuproot` cannot work, too.

Thanks for any help!


Hi!

Parent transid failed is hard to recover from, as mentioned on
https://btrfs.wiki.kernel.org/index.php/FAQ#How_do_I_recover_from_a_.22parent_transid_verify_failed.22_error.3F


I see you have "corrupt 5" sectors in dmesg. Is your disk healthy? You
can check with "smartctl -x /dev/sda" to determine the health.

One way of avoiding this error is to disable write-cache. Parent transid
failed can happen when the disk re-orders writes in its write cache
before flushing to disk. This violates barriers, but it is unfortately
common. If you have a crash, SATA bus reset or other issues, unwritten
content is lost. The problem here is the re-ordering. The superblock is
written out before other metadata (which is now lost due to the crash).

To be extra accurate, all filesysmtems have taken the re-order into
consideration.
Thus we have flush (or called barrier) command to force the disk to
write all its cache back to disk or at least non-volatile cache.

Combined with mandatory metadata CoW, it means, no matter what the disk
re-order or not, we should only see either the newer data after the
flush, or the older data before the flush.

But unfortunately, hardware is unreliable, sometimes even lies about its
flush command.
Thus it's possible some disks, especially some cheap RAID cards, tend to
just ignore such flush commands, thus leaves the data corrupted after a
power loss.

Thanks,
Qu

quoted

You disable write cache with "hdparm -W0 /dev/sda". It might be worth
adding this to a cron-job every 5 minutes or so, as the setting is not
persistent and can get reset if the disk looses power, goes to sleep,
etc.

Attachments

check.txt [text/plain] 80962 bytes · preview

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help