Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
From: Qu Wenruo <hidden>
Date: 2021-07-17 00:25:55
On 2021/7/17 上午8:18, Dave T wrote:
On Fri, Jul 16, 2021 at 7:06 PM Qu Wenruo [off-list ref] wrote:quoted
So far so good, every thing is working as expected.Thank you for confirming. I learned a lot in this discussion.quoted
Just the btrfs-check is a little paranoid. BTW, despite the bad file extent and csum missing error, is there any other error reported from btrfs check?No, there was not. However... (see below)quoted
It's a pity that we didn't get the dmesg of that RO event, it should contain the most valuable info. But at least so far your old fs is pretty fine, you can continue using it.... since you don't need me to do any more testing on this fs and I don't need the old fs anymore, I decided to experiment. I did the following operations: btrfs check --mode=lowmem /dev/mapper/${mydev}luks This reported exactly the same csum issue that I showed you previously. For example: ERROR: root 334 EXTENT_DATA[258 73728] compressed extent must have csum, but only 0 bytes have, expect 4096 ERROR: root 334 EXTENT_DATA[258 73728] is compressed, but inode flag doesn't allow it The roots and inodes appear to be the same ones reported previously. Nothing new. So I experimented with these operations: # btrfs check --clear-space-cache v1 /dev/mapper/${mydev}luks Checking filesystem on /dev/mapper/sda2luks UUID: ff2b04ab-088c-4fb0-9ad4-84780c23f821 Free space cache cleared (no errors reported)
This is pretty safe, but can be slow on very large fs.
I wanted to try that on a fs I don't care about before I try it for
real. I also wanted to try the next operation.
# btrfs check --clear-ino-cache /dev/mapper/${mydev}luks
...
Successfully cleaned up ino cache for root id: 5
Successfully cleaned up ino cache for root id: 257
Successfully cleaned up ino cache for root id: 258
(no errors reported)Inode cache is now deprecated and rarely used. It should do nothing on your fs anyway.
I have never used the repair option, but I decided to see what would
happen with this next operation. Maybe I should not have combined
these parameters?
# btrfs check --repair --init-csum-tree /dev/mapper/${mydev}luksThis is a little dangerous, especially there isn't much experiments/tests when used with missing csums.
... Reinitialize checksum tree [1/7] checking root items Fixed 0 roots. [2/7] checking extents ref mismatch on [22921216 16384] extent item 1, found 0 backref 22921216 root 7 not referenced back 0x56524a54f850 incorrect global backref count on 22921216 found 1 wanted 0 backpointer mismatch on [22921216 16384] owner ref check failed [22921216 16384] repair deleting extent record: key [22921216,169,0] Repaired extent references for 22921216 ref mismatch on [23085056 16384] extent item 1, found 0 backref 23085056 root 7 not referenced back 0x565264430000 incorrect global backref count on 23085056 found 1 wanted 0 backpointer mismatch on [23085056 16384] owner ref check failed [23085056 16384] repair deleting extent record: key [23085056,169,0] ... more (The above operation reported tons of errors. Maybe I did damage to the fs with this operation? Are any of the errors of interest to you?)
This is definitely caused by the repair, but I don't think it's a big deal.
I ran it again, but with just the --repair option:
# btrfs check --repair /dev/mapper/${mydev}luks
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/mapper/xyzluks
UUID: ff2b04ab-088c-4fb0-9ad4-84780c23f821
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ref mismatch on [21625421824 28672] extent item 17, found 16
incorrect local backref count on 21625421824 parent 106806263808 owner
0 offset 0 found 0 wanted 1 back 0x55798f5fdc10
backref disk bytenr does not match extent record, bytenr=21625421824,
ref bytenr=0
backpointer mismatch on [21625421824 28672]
repair deleting extent record: key [21625421824,168,28672]
adding new data backref on 21625421824 parent 368825810944 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 309755756544 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 122323271680 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 139575754752 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 107060248576 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 107140677632 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 107212980224 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 771014656 owner 0 offset 0 found 1
adding new data backref on 21625421824 parent 180469760 owner 0 offset 0 found 1
adding new data backref on 21625421824 root 26792 owner 359 offset 0 found 1
adding new data backref on 21625421824 parent 160677888 owner 0 offset 0 found 1
adding new data backref on 21625421824 parent 461373440 owner 0 offset 0 found 1
adding new data backref on 21625421824 root 1761 owner 359 offset 0 found 1
adding new data backref on 21625421824 root 280 owner 359 offset 0 found 1
adding new data backref on 21625421824 root 326 owner 359 offset 0 found 1
adding new data backref on 21625421824 root 26786 owner 359 offset 0 found 1
Repaired extent references for 21625421824
ref mismatch on [21625450496 4096] extent item 17, found 16
incorrect local backref count on 21625450496 parent 106806263808 owner
0 offset 0 found 0 wanted 1 back 0x55798f5fe340
backref disk bytenr does not match extent record, bytenr=21625450496,
ref bytenr=0
backpointer mismatch on [21625450496 4096]
repair deleting extent record: key [21625450496,168,4096]
adding new data backref on 21625450496 parent 368825810944 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 309755756544 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 122323271680 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 139575754752 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 107060248576 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 107140677632 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 107212980224 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 771014656 owner 0 offset 0 found 1
adding new data backref on 21625450496 parent 180469760 owner 0 offset 0 found 1
adding new data backref on 21625450496 root 26792 owner 369 offset 0 found 1
adding new data backref on 21625450496 parent 160677888 owner 0 offset 0 found 1
...more
It reported many, many more errors.At the same time, it also says it's repairing these problems.
I'm not sure if any of that interests you. My plan now is to wipe and reuse this SSD for something else (with a BTRFS fs of course).
That's completely fine. But before that, would you mind to run "btrfs check" again on the fs to see if it reports any error? I'm interested to see the result though.
I'm just curious about one thing. Did I create all these problems with the repair option or were these underlying issues that were not previously found?
It's mostly created by the repair, as --init-csum-tree would re-generate csums, it will also cause the old csum items to mismatch from its extent items. It's mostly expected, but normally btrfs check --repair should be able to fix them. If not, we need to fix btrfs-progs then. Thanks, Qu