Thread (15 messages) 15 messages, 3 authors, 2021-07-25

Re: bad file extent, some csum missing - how to check that restored volumes are error-free?

From: Qu Wenruo <hidden>
Date: 2021-07-17 00:25:55


On 2021/7/17 上午8:18, Dave T wrote:
On Fri, Jul 16, 2021 at 7:06 PM Qu Wenruo [off-list ref] wrote:
quoted
So far so good, every thing is working as expected.
Thank you for confirming. I learned a lot in this discussion.

quoted
Just the btrfs-check is a little paranoid.

BTW, despite the bad file extent and csum missing error, is there any
other error reported from btrfs check?
No, there was not. However... (see below)
quoted
It's a pity that we didn't get the dmesg of that RO event, it should
contain the most valuable info.

But at least so far your old fs is pretty fine, you can continue using it.
... since you don't need me to do any more testing on this fs and I
don't need the old fs anymore, I decided to experiment.

I did the following operations:

btrfs check --mode=lowmem /dev/mapper/${mydev}luks
This reported exactly the same csum issue that I showed you
previously. For example:
ERROR: root 334 EXTENT_DATA[258 73728] compressed extent must have
csum, but only 0 bytes have, expect 4096
ERROR: root 334 EXTENT_DATA[258 73728] is compressed, but inode flag
doesn't allow it
The roots and inodes appear to be the same ones reported previously.
Nothing new.

So I experimented with these operations:
# btrfs check --clear-space-cache v1 /dev/mapper/${mydev}luks
Checking filesystem on /dev/mapper/sda2luks
UUID: ff2b04ab-088c-4fb0-9ad4-84780c23f821
Free space cache cleared
(no errors reported)
This is pretty safe, but can be slow on very large fs.
I wanted to try that on a fs I don't care about before I try it for
real. I also wanted to try the next operation.

# btrfs check --clear-ino-cache  /dev/mapper/${mydev}luks
...
Successfully cleaned up ino cache for root id: 5
Successfully cleaned up ino cache for root id: 257
Successfully cleaned up ino cache for root id: 258
(no errors reported)
Inode cache is now deprecated and rarely used. It should do nothing on
your fs anyway.
I have never used the repair option, but I decided to see what would
happen with this next operation. Maybe I should not have combined
these parameters?

# btrfs check --repair --init-csum-tree /dev/mapper/${mydev}luks
This is a little dangerous, especially there isn't much
experiments/tests when used with missing csums.
...
Reinitialize checksum tree
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ref mismatch on [22921216 16384] extent item 1, found 0
backref 22921216 root 7 not referenced back 0x56524a54f850
incorrect global backref count on 22921216 found 1 wanted 0
backpointer mismatch on [22921216 16384]
owner ref check failed [22921216 16384]
repair deleting extent record: key [22921216,169,0]
Repaired extent references for 22921216
ref mismatch on [23085056 16384] extent item 1, found 0
backref 23085056 root 7 not referenced back 0x565264430000
incorrect global backref count on 23085056 found 1 wanted 0
backpointer mismatch on [23085056 16384]
owner ref check failed [23085056 16384]
repair deleting extent record: key [23085056,169,0]
... more
(The above operation reported tons of errors. Maybe I did damage to
the fs with this operation? Are any of the errors of interest to you?)
This is definitely caused by the repair, but I don't think it's a big deal.
I ran it again, but with just the --repair option:
# btrfs check --repair /dev/mapper/${mydev}luks
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/mapper/xyzluks
UUID: ff2b04ab-088c-4fb0-9ad4-84780c23f821
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ref mismatch on [21625421824 28672] extent item 17, found 16
incorrect local backref count on 21625421824 parent 106806263808 owner
0 offset 0 found 0 wanted 1 back 0x55798f5fdc10
backref disk bytenr does not match extent record, bytenr=21625421824,
ref bytenr=0
backpointer mismatch on [21625421824 28672]
repair deleting extent record: key [21625421824,168,28672]
adding new data backref on 21625421824 parent 368825810944 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 309755756544 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 122323271680 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 139575754752 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 107060248576 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 107140677632 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 107212980224 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 771014656 owner 0 offset 0 found 1
adding new data backref on 21625421824 parent 180469760 owner 0 offset 0 found 1
adding new data backref on 21625421824 root 26792 owner 359 offset 0 found 1
adding new data backref on 21625421824 parent 160677888 owner 0 offset 0 found 1
adding new data backref on 21625421824 parent 461373440 owner 0 offset 0 found 1
adding new data backref on 21625421824 root 1761 owner 359 offset 0 found 1
adding new data backref on 21625421824 root 280 owner 359 offset 0 found 1
adding new data backref on 21625421824 root 326 owner 359 offset 0 found 1
adding new data backref on 21625421824 root 26786 owner 359 offset 0 found 1
Repaired extent references for 21625421824
ref mismatch on [21625450496 4096] extent item 17, found 16
incorrect local backref count on 21625450496 parent 106806263808 owner
0 offset 0 found 0 wanted 1 back 0x55798f5fe340
backref disk bytenr does not match extent record, bytenr=21625450496,
ref bytenr=0
backpointer mismatch on [21625450496 4096]
repair deleting extent record: key [21625450496,168,4096]
adding new data backref on 21625450496 parent 368825810944 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 309755756544 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 122323271680 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 139575754752 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 107060248576 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 107140677632 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 107212980224 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 771014656 owner 0 offset 0 found 1
adding new data backref on 21625450496 parent 180469760 owner 0 offset 0 found 1
adding new data backref on 21625450496 root 26792 owner 369 offset 0 found 1
adding new data backref on 21625450496 parent 160677888 owner 0 offset 0 found 1
...more
It reported many, many more errors.
At the same time, it also says it's repairing these problems.
I'm not sure if any of that
interests you. My plan now is to wipe and reuse this SSD for something
else (with a BTRFS fs of course).
That's completely fine.

But before that, would you mind to run "btrfs check" again on the fs to
see if it reports any error?

I'm interested to see the result though.
I'm just curious about one thing. Did I create all these problems with
the repair option or were these underlying issues that were not
previously found?
It's mostly created by the repair, as --init-csum-tree would re-generate
csums, it will also cause the old csum items to mismatch from its extent
items.

It's mostly expected, but normally btrfs check --repair should be able
to fix them.
If not, we need to fix btrfs-progs then.

Thanks,
Qu
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help