Thread (7 messages) 7 messages, 3 authors, 2021-12-21

Re: btrfs_free_extent

From: Qu Wenruo <hidden>
Date: 2021-12-19 23:42:20


On 2021/12/19 23:24, Tuetuopay wrote:
Hi,

I need some advice on a btrfs raid-1 volume that shows a few corruptions
on some places. I have some files that triggered some safeguards on
write, which ended up remounting the fs as read-only.

Over on IRC, multicore suggested me to run a readonly check, whose
output is here:

# btrfs check --readonly
/dev/disk/by-uuid/e944a837-f89b-48ea-80fd-40b2bec8f21b
Opening filesystem to check...
Checking filesystem on /dev/disk/by-uuid/e944a837-f89b-48ea-80fd-40b2bec8f21b
UUID: e944a837-f89b-48ea-80fd-40b2bec8f21b
[1/7] checking root items
[2/7] checking extents
tree backref 9882747355136 root 7 not found in extent tree
backref 9882747355136 root 23 not referenced back 0x556ea3cb07d0
This is one corruption in extent tree, we don't have root 23 at all.
Only root 7 is correct.

On the other hand, 23 = 0x17, while 7 = 0x07.

So, see a pattern here?

Thus recommend to memtest to make sure it's not a memory bitflip causing
the corruption in the first hand.
incorrect global backref count on 9882747355136 found 2 wanted 1
backpointer mismatch on [9882747355136 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
root 5 inode 1626695 errors 40000
Dir items with mismatch hash:
	name: fendor.qti.hardware.sigma_miracast@1.0-impl.so namelen: 46 wanted
0x12c67915 has 0x0471bc31
root 5 inode 1626696 errors 2000, link count wrong
	unresolved ref dir 1626695 index 2 namelen 46 name
vendor.qti.hardware.sigma_miracast@1.0-impl.so filetype 1 errors 1, no
dir item
This can also be caused by memory bitfip.

Fortunately, both cases should be repairable.
But that should only be done after you have checked your memory.
You won't want to have unreliable memory which can definitely cause more
damage during repair.

But it's still better to keep important data backed up.
ERROR: errors found in fs roots
found 6870080626688 bytes used, error(s) found
total csum bytes: 6668958308
total tree bytes: 9075539968
total fs tree bytes: 1478344704
total extent tree bytes: 243793920
btree space waste bytes: 820626944
file data blocks allocated: 326941710356480
  referenced 6854941941760

They suggested that I run a non-ro check, but warned that it could do
more harm than good, hence this email seeking advice. Has check any
chance to fix the issue?

I think I should also mention that I'm fine deleting those specific
files as I can get them back somewhat easily.

To finish off, here is the information requested by the wiki page:

$ uname -a
Linux gimli 5.10.70-3ware #1 SMP Wed Dec 15 03:46:13 CET 2021 x86_64 GNU/Linux
One thing to mention is, if you're running kernel newer than v5.11, the
last corruption (the one on name hash mismatch) can be detected early,
without writing the corrupted data back to disk.

Thus it's recommended to use newer kernel.

Thanks,
Qu
$ btrfs fi show
Label: none  uuid: 381bd0ef-20cb-4517-b825-d45630a6ca0a
	Total devices 1 FS bytes used 65.49GiB
	devid    1 size 111.79GiB used 111.79GiB path /dev/sdk1

Label: 'storage'  uuid: e944a837-f89b-48ea-80fd-40b2bec8f21b
	Total devices 5 FS bytes used 6.25TiB
	devid    1 size 2.73TiB used 2.50TiB path /dev/sdd
	devid    2 size 2.73TiB used 2.50TiB path /dev/sdc
	devid    4 size 931.51GiB used 702.00GiB path /dev/sdf
	devid    6 size 3.64TiB used 3.41TiB path /dev/sdg
	devid    7 size 3.64TiB used 3.41TiB path /dev/sdh

$ btrfs fi df /media/storage
Data, RAID1: total=6.25TiB, used=6.24TiB
System, RAID1: total=32.00MiB, used=944.00KiB
Metadata, RAID1: total=10.00GiB, used=8.45GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
$ btrfs --version
btrfs-progs v5.10.1

The dmesg is attached to the email, but most of the `BTRFS critical` log
lines related to name corruption have been removed to get the file to 200KB.

Some things to note:
- I recently upgraded the machine from Debian 9 to 11, getting the
kernel from 4.9 to 5.10, but the issue already existed on 4.9 (it even
started there, prompting me to replace a drive as I though it to be the
source of the corruption).
- The kernel is almost the vanilla debian bullseye kernel, with an added
(tiny) patch to fix an issue between 3Ware RAID cards and AMD Ryzen
CPUs. It should not affect the BTRFS subsystem as it adds a quirk to the
PCIe subsystem.
- I have a few name mismatches, which can be seen in the logs too. While
I'd love someday to get rid of them, I simply moved the affected files
in a corner for now. That's not the issue I'm trying to solve now
(though if someone can help, I'd be glad). They come from a ZIP archive,
so deleting them is fine, but I can't as I only get "Input/Output error"
when trying to rm them.

Thank you very much to whoever can help!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help