Re: Leaf corruption due to csum range
From: Filipe Manana <hidden>
Date: 2021-05-11 08:56:46
On Mon, May 10, 2021 at 10:01 PM Philipp Fent [off-list ref] wrote:
I encountered a btrfs error on my system. I run Microsoft SQL Server in a docker container on a btrfs filesystem on an SSD. When bulk-loading some benchmark data, my system reproducibly enters in the following failing state: [ 366.665714] BTRFS critical (device sda): corrupt leaf: root=18446744073709551610 block=507544305664 slot=0, csum end range (308900515840) goes beyond the start range (308900384768) of the next csum item [ 366.665723] BTRFS info (device sda): leaf 507544305664 gen 18292 total ptrs 4 free space 3 owner 18446744073709551610 [ 366.665725] item 0 key (18446744073709551606 128 308891275264) itemoff 7259 itemsize 9024 [ 366.665727] item 1 key (18446744073709551606 128 308900384768) itemoff 7067 itemsize 192 [ 366.665728] item 2 key (18446744073709551606 128 309036716032) itemoff 2587 itemsize 4480 [ 366.665730] item 3 key (18446744073709551606 128 309041303552) itemoff 103 itemsize 2484 [ 366.665731] BTRFS error (device sda): block=507544305664 write time tree block corruption detected [ 366.665821] BTRFS: error (device sda) in btrfs_sync_log:3136: errno=-5 IO failure [ 366.665824] BTRFS info (device sda): forced readonly Please note the erroring ranges: csum end: 308900515840 Start next: 308900384768 which is a difference of (1 << 17) == 0b100000000000000000 == 128KB To me, this looks suspiciously like an off-by-one error, but I'm not too versed in debugging btrfs.
Most likely it's a race when adding checksums. In this case for the log tree (fsync). This has happened in the past and the most recent fix was: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e289f03ea79bbc6574b78ac25682555423a91cbb There were cases too that affected the csum tree and not the log tree, but those are many years old now.
I reproduced this several times on my machine using the attached scripts. The only obvious similarity between the crashes is this 128KB csum end / start next. Sometimes a get one corrupt leaf, sometimes many. I tried to reproduce it on another machine with an HDD, but didn't encounter this error there. Can you help me to debug this further?
Try to see if there are reflink operations (clone and dedupe) done by sql server (or maybe docker), in case there aren't, that excludes shared extents being the cause of the problem. I'll have to look at the code and think what might go wrong to lead to that, so I can't say that I have exact steps on how to debug that. Thanks.
# uname -a
Linux desk 5.12.2-arch1-1 #1 SMP PREEMPT Fri, 07 May 2021 15:36:06 +0000
x86_64 GNU/Linux
# btrfs --version
btrfs-progs v5.11.1
# btrfs fi show
Label: none uuid: 6733acf5-be40-4fe2-9d6f-819d39e49720
Total devices 1 FS bytes used 187.11GiB
devid 1 size 931.51GiB used 208.03GiB path /dev/sda
# btrfs fi df /ssdSpace
Data, single: total=207.00GiB, used=186.67GiB
System, single: total=32.00MiB, used=48.00KiB
Metadata, single: total=1.00GiB, used=450.08MiB
GlobalReserve, single: total=215.41MiB, used=0.00B-- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”