Thread (7 messages) 7 messages, 3 authors, 2021-12-21

Re: btrfs_free_extent

From: Qu Wenruo <hidden>
Date: 2021-12-21 02:34:58


On 2021/12/21 10:05, Tuetuopay wrote:
Hi,

Thank you so much for your advice. The check in repair mode did indeed
work without issue, and the issues I had with the files now seem gone.
I'm stressing a bit the drives right now to see if everything's solved,
but it looks like it.
To be safe, another read-only btrfs-check would tell you if all the 
metadata problems are gone.

And for data correctness, scrub is always the way you go.

Thanks,
Qu
Cheers!
Alexis

On déc. 21 2021, at 12:20 am, Qu Wenruo [off-list ref] wrote:
quoted
On 2021/12/21 04:30, Tuetuopay wrote:
quoted
Hi,
  
It's me again. I have completed several memtest86+ passes without errors
whatsoever, so this RAM can be considered good. Also, following your
advice, I built and upgraded the kernel to the latest stable, i.e. 5.15.10.
  
What is the next step to (hopefully) fix the error? Is it to run `btrfs
check` but not in readonly mode. I think I'll need to upgrade
btrfs-progs too since I'm now running 5.15.10 instead of 5.10.70.
  
Yes, latest btrfs-progs is always recommended.
  
After backing up the important data and upgrading btrfs-progs, "btrfs
check --repair" could at least solve the extent tree problem.
  
Thanks,
Qu
quoted
  
Thank you so much in advance!
  
Alexis
  
On déc. 20 2021, at 10:35 am, Tuetuopay [off-list ref] wrote:
quoted
Hi, thanks for the swift reply!
  
On déc. 20 2021, at 12:42 am, Qu Wenruo [off-list ref] wrote:
quoted
On 2021/12/19 23:24, Tuetuopay wrote:
quoted
Hi,
  
I need some advice on a btrfs raid-1 volume that shows a few corruptions
on some places. I have some files that triggered some safeguards on
write, which ended up remounting the fs as read-only.
  
Over on IRC, multicore suggested me to run a readonly check, whose
output is here:
  
# btrfs check --readonly
/dev/disk/by-uuid/e944a837-f89b-48ea-80fd-40b2bec8f21b
Opening filesystem to check...
Checking filesystem on /dev/disk/by-uuid/e944a837-f89b-48ea-80fd-40b2bec8f21b
UUID: e944a837-f89b-48ea-80fd-40b2bec8f21b
[1/7] checking root items
[2/7] checking extents
tree backref 9882747355136 root 7 not found in extent tree
backref 9882747355136 root 23 not referenced back 0x556ea3cb07d0
  
This is one corruption in extent tree, we don't have root 23 at all.
Only root 7 is correct.
  
On the other hand, 23 = 0x17, while 7 = 0x07.
  
So, see a pattern here?
  
Thus recommend to memtest to make sure it's not a memory bitflip causing
the corruption in the first hand.
  
That definitely looks like a bitflip to me.
  
quoted
quoted
incorrect global backref count on 9882747355136 found 2 wanted 1
backpointer mismatch on [9882747355136 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
root 5 inode 1626695 errors 40000
Dir items with mismatch hash:
	name: fendor.qti.hardware.sigma_miracast@1.0-impl.so namelen: 46 wanted
0x12c67915 has 0x0471bc31
root 5 inode 1626696 errors 2000, link count wrong
	unresolved ref dir 1626695 index 2 namelen 46 name
vendor.qti.hardware.sigma_miracast@1.0-impl.so filetype 1 errors
1, no
dir item
  
This can also be caused by memory bitfip.
  
Fortunately, both cases should be repairable.
But that should only be done after you have checked your memory.
You won't want to have unreliable memory which can definitely cause more
damage during repair.
  
But it's still better to keep important data backed up.
  
Yes, definitely a bitflip, f = 0x66 and v = 0x76.
  
quoted
quoted
ERROR: errors found in fs roots
found 6870080626688 bytes used, error(s) found
total csum bytes: 6668958308
total tree bytes: 9075539968
total fs tree bytes: 1478344704
total extent tree bytes: 243793920
btree space waste bytes: 820626944
file data blocks allocated: 326941710356480
    referenced 6854941941760
  
They suggested that I run a non-ro check, but warned that it could do
more harm than good, hence this email seeking advice. Has check any
chance to fix the issue?
  
I think I should also mention that I'm fine deleting those specific
files as I can get them back somewhat easily.
  
To finish off, here is the information requested by the wiki page:
  
$ uname -a
Linux gimli 5.10.70-3ware #1 SMP Wed Dec 15 03:46:13 CET 2021
x86_64 GNU/Linux
  
One thing to mention is, if you're running kernel newer than v5.11, the
last corruption (the one on name hash mismatch) can be detected early,
without writing the corrupted data back to disk.
  
Thus it's recommended to use newer kernel.
  
Amazing advice. I'll definitely upgrade the kernel, likely latest.
  
quoted
Thanks,
Qu
  
Thank you very much to you! I just started a full memtest on the
machine. I expect it to be good, since the RAM is brand new (just
swapped the whole system due to the previous motherboard dying), but you
never know. I'll get back to you with the results!
  
Also, if I can get my hands on a DDR3 system, I'll test the old ram to
be sure. If this ends up being a RAM issue, I'll send back the current
one and buy some ECC memory.
  
Thanks,
Alexis
  
quoted
quoted
$ btrfs fi show
Label: none  uuid: 381bd0ef-20cb-4517-b825-d45630a6ca0a
	Total devices 1 FS bytes used 65.49GiB
	devid    1 size 111.79GiB used 111.79GiB path /dev/sdk1
  
Label: 'storage'  uuid: e944a837-f89b-48ea-80fd-40b2bec8f21b
	Total devices 5 FS bytes used 6.25TiB
	devid    1 size 2.73TiB used 2.50TiB path /dev/sdd
	devid    2 size 2.73TiB used 2.50TiB path /dev/sdc
	devid    4 size 931.51GiB used 702.00GiB path /dev/sdf
	devid    6 size 3.64TiB used 3.41TiB path /dev/sdg
	devid    7 size 3.64TiB used 3.41TiB path /dev/sdh
  
$ btrfs fi df /media/storage
Data, RAID1: total=6.25TiB, used=6.24TiB
System, RAID1: total=32.00MiB, used=944.00KiB
Metadata, RAID1: total=10.00GiB, used=8.45GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
$ btrfs --version
btrfs-progs v5.10.1
  
The dmesg is attached to the email, but most of the `BTRFS
critical` log
lines related to name corruption have been removed to get the file
to 200KB.
  
Some things to note:
- I recently upgraded the machine from Debian 9 to 11, getting the
kernel from 4.9 to 5.10, but the issue already existed on 4.9 (it even
started there, prompting me to replace a drive as I though it to
be the
source of the corruption).
- The kernel is almost the vanilla debian bullseye kernel, with an added
(tiny) patch to fix an issue between 3Ware RAID cards and AMD Ryzen
CPUs. It should not affect the BTRFS subsystem as it adds a quirk
to the
PCIe subsystem.
- I have a few name mismatches, which can be seen in the logs too. While
I'd love someday to get rid of them, I simply moved the affected files
in a corner for now. That's not the issue I'm trying to solve now
(though if someone can help, I'd be glad). They come from a ZIP archive,
so deleting them is fine, but I can't as I only get "Input/Output error"
when trying to rm them.
  
Thank you very much to whoever can help!
  
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help