Re: btrfs_free_extent
From: Qu Wenruo <hidden>
Date: 2021-12-19 23:42:20
On 2021/12/19 23:24, Tuetuopay wrote:
Hi, I need some advice on a btrfs raid-1 volume that shows a few corruptions on some places. I have some files that triggered some safeguards on write, which ended up remounting the fs as read-only. Over on IRC, multicore suggested me to run a readonly check, whose output is here: # btrfs check --readonly /dev/disk/by-uuid/e944a837-f89b-48ea-80fd-40b2bec8f21b Opening filesystem to check... Checking filesystem on /dev/disk/by-uuid/e944a837-f89b-48ea-80fd-40b2bec8f21b UUID: e944a837-f89b-48ea-80fd-40b2bec8f21b [1/7] checking root items [2/7] checking extents tree backref 9882747355136 root 7 not found in extent tree backref 9882747355136 root 23 not referenced back 0x556ea3cb07d0
This is one corruption in extent tree, we don't have root 23 at all. Only root 7 is correct. On the other hand, 23 = 0x17, while 7 = 0x07. So, see a pattern here? Thus recommend to memtest to make sure it's not a memory bitflip causing the corruption in the first hand.
incorrect global backref count on 9882747355136 found 2 wanted 1 backpointer mismatch on [9882747355136 16384] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache [4/7] checking fs roots root 5 inode 1626695 errors 40000 Dir items with mismatch hash: name: fendor.qti.hardware.sigma_miracast@1.0-impl.so namelen: 46 wanted 0x12c67915 has 0x0471bc31 root 5 inode 1626696 errors 2000, link count wrong unresolved ref dir 1626695 index 2 namelen 46 name vendor.qti.hardware.sigma_miracast@1.0-impl.so filetype 1 errors 1, no dir item
This can also be caused by memory bitfip. Fortunately, both cases should be repairable. But that should only be done after you have checked your memory. You won't want to have unreliable memory which can definitely cause more damage during repair. But it's still better to keep important data backed up.
ERROR: errors found in fs roots found 6870080626688 bytes used, error(s) found total csum bytes: 6668958308 total tree bytes: 9075539968 total fs tree bytes: 1478344704 total extent tree bytes: 243793920 btree space waste bytes: 820626944 file data blocks allocated: 326941710356480 referenced 6854941941760 They suggested that I run a non-ro check, but warned that it could do more harm than good, hence this email seeking advice. Has check any chance to fix the issue? I think I should also mention that I'm fine deleting those specific files as I can get them back somewhat easily. To finish off, here is the information requested by the wiki page: $ uname -a Linux gimli 5.10.70-3ware #1 SMP Wed Dec 15 03:46:13 CET 2021 x86_64 GNU/Linux
One thing to mention is, if you're running kernel newer than v5.11, the last corruption (the one on name hash mismatch) can be detected early, without writing the corrupted data back to disk. Thus it's recommended to use newer kernel. Thanks, Qu
$ btrfs fi show Label: none uuid: 381bd0ef-20cb-4517-b825-d45630a6ca0a Total devices 1 FS bytes used 65.49GiB devid 1 size 111.79GiB used 111.79GiB path /dev/sdk1 Label: 'storage' uuid: e944a837-f89b-48ea-80fd-40b2bec8f21b Total devices 5 FS bytes used 6.25TiB devid 1 size 2.73TiB used 2.50TiB path /dev/sdd devid 2 size 2.73TiB used 2.50TiB path /dev/sdc devid 4 size 931.51GiB used 702.00GiB path /dev/sdf devid 6 size 3.64TiB used 3.41TiB path /dev/sdg devid 7 size 3.64TiB used 3.41TiB path /dev/sdh $ btrfs fi df /media/storage Data, RAID1: total=6.25TiB, used=6.24TiB System, RAID1: total=32.00MiB, used=944.00KiB Metadata, RAID1: total=10.00GiB, used=8.45GiB GlobalReserve, single: total=512.00MiB, used=0.00B $ btrfs --version btrfs-progs v5.10.1 The dmesg is attached to the email, but most of the `BTRFS critical` log lines related to name corruption have been removed to get the file to 200KB. Some things to note: - I recently upgraded the machine from Debian 9 to 11, getting the kernel from 4.9 to 5.10, but the issue already existed on 4.9 (it even started there, prompting me to replace a drive as I though it to be the source of the corruption). - The kernel is almost the vanilla debian bullseye kernel, with an added (tiny) patch to fix an issue between 3Ware RAID cards and AMD Ryzen CPUs. It should not affect the BTRFS subsystem as it adds a quirk to the PCIe subsystem. - I have a few name mismatches, which can be seen in the logs too. While I'd love someday to get rid of them, I simply moved the affected files in a corner for now. That's not the issue I'm trying to solve now (though if someone can help, I'd be glad). They come from a ZIP archive, so deleting them is fine, but I can't as I only get "Input/Output error" when trying to rm them. Thank you very much to whoever can help!