Re: Leaf corruption due to csum range

From: Filipe Manana <hidden>
Date: 2021-05-11 08:56:46

On Mon, May 10, 2021 at 10:01 PM Philipp Fent [off-list ref] wrote:

I encountered a btrfs error on my system. I run Microsoft SQL Server in
a docker container on a btrfs filesystem on an SSD. When bulk-loading
some benchmark data, my system reproducibly enters in the following
failing state:

[  366.665714] BTRFS critical (device sda): corrupt leaf:
root=18446744073709551610 block=507544305664 slot=0, csum end range
(308900515840) goes beyond the start range (308900384768) of the next
csum item
[  366.665723] BTRFS info (device sda): leaf 507544305664 gen 18292
total ptrs 4 free space 3 owner 18446744073709551610
[  366.665725]  item 0 key (18446744073709551606 128 308891275264)
itemoff 7259 itemsize 9024
[  366.665727]  item 1 key (18446744073709551606 128 308900384768)
itemoff 7067 itemsize 192
[  366.665728]  item 2 key (18446744073709551606 128 309036716032)
itemoff 2587 itemsize 4480
[  366.665730]  item 3 key (18446744073709551606 128 309041303552)
itemoff 103 itemsize 2484
[  366.665731] BTRFS error (device sda): block=507544305664 write time
tree block corruption detected
[  366.665821] BTRFS: error (device sda) in btrfs_sync_log:3136:
errno=-5 IO failure
[  366.665824] BTRFS info (device sda): forced readonly

Please note the erroring ranges:
csum end:   308900515840
Start next: 308900384768
which is a difference of (1 << 17) == 0b100000000000000000 == 128KB
To me, this looks suspiciously like an off-by-one error, but I'm not too
versed in debugging btrfs.

Most likely it's a race when adding checksums. In this case for the
log tree (fsync).
This has happened in the past and the most recent fix was:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e289f03ea79bbc6574b78ac25682555423a91cbb

There were cases too that affected the csum tree and not the log tree,
but those are many years old now.

I reproduced this several times on my machine using the attached
scripts. The only obvious similarity between the crashes is this 128KB
csum end / start next. Sometimes a get one corrupt leaf, sometimes many.
I tried to reproduce it on another machine with an HDD, but didn't
encounter this error there.
Can you help me to debug this further?

Try to see if there are reflink operations (clone and dedupe) done by
sql server (or maybe docker), in case there aren't, that excludes
shared extents being the cause of the problem.

I'll have to look at the code and think what might go wrong to lead to
that, so I can't say that I have exact steps on how to debug that.

Thanks.

# uname -a
Linux desk 5.12.2-arch1-1 #1 SMP PREEMPT Fri, 07 May 2021 15:36:06 +0000
x86_64 GNU/Linux
# btrfs --version
btrfs-progs v5.11.1
# btrfs fi show
Label: none  uuid: 6733acf5-be40-4fe2-9d6f-819d39e49720
        Total devices 1 FS bytes used 187.11GiB
        devid    1 size 931.51GiB used 208.03GiB path /dev/sda
# btrfs fi df /ssdSpace
Data, single: total=207.00GiB, used=186.67GiB
System, single: total=32.00MiB, used=48.00KiB
Metadata, single: total=1.00GiB, used=450.08MiB
GlobalReserve, single: total=215.41MiB, used=0.00B



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help