Re: btrfs_free_extent
From: Qu Wenruo <hidden>
Date: 2021-12-21 02:34:58
On 2021/12/21 10:05, Tuetuopay wrote:
Hi, Thank you so much for your advice. The check in repair mode did indeed work without issue, and the issues I had with the files now seem gone. I'm stressing a bit the drives right now to see if everything's solved, but it looks like it.
To be safe, another read-only btrfs-check would tell you if all the metadata problems are gone. And for data correctness, scrub is always the way you go. Thanks, Qu
Cheers! Alexis On déc. 21 2021, at 12:20 am, Qu Wenruo [off-list ref] wrote:quoted
On 2021/12/21 04:30, Tuetuopay wrote:quoted
Hi, It's me again. I have completed several memtest86+ passes without errors whatsoever, so this RAM can be considered good. Also, following your advice, I built and upgraded the kernel to the latest stable, i.e. 5.15.10. What is the next step to (hopefully) fix the error? Is it to run `btrfs check` but not in readonly mode. I think I'll need to upgrade btrfs-progs too since I'm now running 5.15.10 instead of 5.10.70.Yes, latest btrfs-progs is always recommended. After backing up the important data and upgrading btrfs-progs, "btrfs check --repair" could at least solve the extent tree problem. Thanks, Ququoted
Thank you so much in advance! Alexis On déc. 20 2021, at 10:35 am, Tuetuopay [off-list ref] wrote:quoted
Hi, thanks for the swift reply! On déc. 20 2021, at 12:42 am, Qu Wenruo [off-list ref] wrote:quoted
On 2021/12/19 23:24, Tuetuopay wrote:quoted
Hi, I need some advice on a btrfs raid-1 volume that shows a few corruptions on some places. I have some files that triggered some safeguards on write, which ended up remounting the fs as read-only. Over on IRC, multicore suggested me to run a readonly check, whose output is here: # btrfs check --readonly /dev/disk/by-uuid/e944a837-f89b-48ea-80fd-40b2bec8f21b Opening filesystem to check... Checking filesystem on /dev/disk/by-uuid/e944a837-f89b-48ea-80fd-40b2bec8f21b UUID: e944a837-f89b-48ea-80fd-40b2bec8f21b [1/7] checking root items [2/7] checking extents tree backref 9882747355136 root 7 not found in extent tree backref 9882747355136 root 23 not referenced back 0x556ea3cb07d0This is one corruption in extent tree, we don't have root 23 at all. Only root 7 is correct. On the other hand, 23 = 0x17, while 7 = 0x07. So, see a pattern here? Thus recommend to memtest to make sure it's not a memory bitflip causing the corruption in the first hand.That definitely looks like a bitflip to me.quoted
quoted
incorrect global backref count on 9882747355136 found 2 wanted 1 backpointer mismatch on [9882747355136 16384] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache [4/7] checking fs roots root 5 inode 1626695 errors 40000 Dir items with mismatch hash: name: fendor.qti.hardware.sigma_miracast@1.0-impl.so namelen: 46 wanted 0x12c67915 has 0x0471bc31 root 5 inode 1626696 errors 2000, link count wrong unresolved ref dir 1626695 index 2 namelen 46 name vendor.qti.hardware.sigma_miracast@1.0-impl.so filetype 1 errors 1, no dir itemThis can also be caused by memory bitfip. Fortunately, both cases should be repairable. But that should only be done after you have checked your memory. You won't want to have unreliable memory which can definitely cause more damage during repair. But it's still better to keep important data backed up.Yes, definitely a bitflip, f = 0x66 and v = 0x76.quoted
quoted
ERROR: errors found in fs roots found 6870080626688 bytes used, error(s) found total csum bytes: 6668958308 total tree bytes: 9075539968 total fs tree bytes: 1478344704 total extent tree bytes: 243793920 btree space waste bytes: 820626944 file data blocks allocated: 326941710356480 referenced 6854941941760 They suggested that I run a non-ro check, but warned that it could do more harm than good, hence this email seeking advice. Has check any chance to fix the issue? I think I should also mention that I'm fine deleting those specific files as I can get them back somewhat easily. To finish off, here is the information requested by the wiki page: $ uname -a Linux gimli 5.10.70-3ware #1 SMP Wed Dec 15 03:46:13 CET 2021 x86_64 GNU/LinuxOne thing to mention is, if you're running kernel newer than v5.11, the last corruption (the one on name hash mismatch) can be detected early, without writing the corrupted data back to disk. Thus it's recommended to use newer kernel.Amazing advice. I'll definitely upgrade the kernel, likely latest.quoted
Thanks, QuThank you very much to you! I just started a full memtest on the machine. I expect it to be good, since the RAM is brand new (just swapped the whole system due to the previous motherboard dying), but you never know. I'll get back to you with the results! Also, if I can get my hands on a DDR3 system, I'll test the old ram to be sure. If this ends up being a RAM issue, I'll send back the current one and buy some ECC memory. Thanks, Alexisquoted
quoted
$ btrfs fi show Label: none uuid: 381bd0ef-20cb-4517-b825-d45630a6ca0a Total devices 1 FS bytes used 65.49GiB devid 1 size 111.79GiB used 111.79GiB path /dev/sdk1 Label: 'storage' uuid: e944a837-f89b-48ea-80fd-40b2bec8f21b Total devices 5 FS bytes used 6.25TiB devid 1 size 2.73TiB used 2.50TiB path /dev/sdd devid 2 size 2.73TiB used 2.50TiB path /dev/sdc devid 4 size 931.51GiB used 702.00GiB path /dev/sdf devid 6 size 3.64TiB used 3.41TiB path /dev/sdg devid 7 size 3.64TiB used 3.41TiB path /dev/sdh $ btrfs fi df /media/storage Data, RAID1: total=6.25TiB, used=6.24TiB System, RAID1: total=32.00MiB, used=944.00KiB Metadata, RAID1: total=10.00GiB, used=8.45GiB GlobalReserve, single: total=512.00MiB, used=0.00B $ btrfs --version btrfs-progs v5.10.1 The dmesg is attached to the email, but most of the `BTRFS critical` log lines related to name corruption have been removed to get the file to 200KB. Some things to note: - I recently upgraded the machine from Debian 9 to 11, getting the kernel from 4.9 to 5.10, but the issue already existed on 4.9 (it even started there, prompting me to replace a drive as I though it to be the source of the corruption). - The kernel is almost the vanilla debian bullseye kernel, with an added (tiny) patch to fix an issue between 3Ware RAID cards and AMD Ryzen CPUs. It should not affect the BTRFS subsystem as it adds a quirk to the PCIe subsystem. - I have a few name mismatches, which can be seen in the logs too. While I'd love someday to get rid of them, I simply moved the affected files in a corner for now. That's not the issue I'm trying to solve now (though if someone can help, I'd be glad). They come from a ZIP archive, so deleting them is fine, but I can't as I only get "Input/Output error" when trying to rm them. Thank you very much to whoever can help!