Re: Leaf corruption due to csum range
From: Qu Wenruo <hidden>
Date: 2021-05-11 08:44:43
On 2021/5/11 下午4:18, Wang Yugui wrote:
hi, the last 'write time tree block corruption detected' is marked as memory ECC error.
So ECC can failed to recovery the bitflip? Now I can't even rely on ECC memories nowadays? (At least tree-check rocks again) Thanks, QU
From: chil L1n [off-list ref] To: linux-btrfs@vger.kernel.org Date: Sat, 6 Mar 2021 10:10:11 +0100 Subject: btrfs error: write time tree block corruption detected Is this a server with ECC memory? Best Regards Wang Yugui (wangyugui@e16-tech.com) 2021/05/11quoted
I encountered a btrfs error on my system. I run Microsoft SQL Server in a docker container on a btrfs filesystem on an SSD. When bulk-loading some benchmark data, my system reproducibly enters in the following failing state: [ 366.665714] BTRFS critical (device sda): corrupt leaf: root=18446744073709551610 block=507544305664 slot=0, csum end range (308900515840) goes beyond the start range (308900384768) of the next csum item [ 366.665723] BTRFS info (device sda): leaf 507544305664 gen 18292 total ptrs 4 free space 3 owner 18446744073709551610 [ 366.665725] item 0 key (18446744073709551606 128 308891275264) itemoff 7259 itemsize 9024 [ 366.665727] item 1 key (18446744073709551606 128 308900384768) itemoff 7067 itemsize 192 [ 366.665728] item 2 key (18446744073709551606 128 309036716032) itemoff 2587 itemsize 4480 [ 366.665730] item 3 key (18446744073709551606 128 309041303552) itemoff 103 itemsize 2484 [ 366.665731] BTRFS error (device sda): block=507544305664 write time tree block corruption detected [ 366.665821] BTRFS: error (device sda) in btrfs_sync_log:3136: errno=-5 IO failure [ 366.665824] BTRFS info (device sda): forced readonly Please note the erroring ranges: csum end: 308900515840 Start next: 308900384768 which is a difference of (1 << 17) == 0b100000000000000000 == 128KB To me, this looks suspiciously like an off-by-one error, but I'm not too versed in debugging btrfs. I reproduced this several times on my machine using the attached scripts. The only obvious similarity between the crashes is this 128KB csum end / start next. Sometimes a get one corrupt leaf, sometimes many. I tried to reproduce it on another machine with an HDD, but didn't encounter this error there. Can you help me to debug this further? # uname -a Linux desk 5.12.2-arch1-1 #1 SMP PREEMPT Fri, 07 May 2021 15:36:06 +0000 x86_64 GNU/Linux # btrfs --version btrfs-progs v5.11.1 # btrfs fi show Label: none uuid: 6733acf5-be40-4fe2-9d6f-819d39e49720 Total devices 1 FS bytes used 187.11GiB devid 1 size 931.51GiB used 208.03GiB path /dev/sda # btrfs fi df /ssdSpace Data, single: total=207.00GiB, used=186.67GiB System, single: total=32.00MiB, used=48.00KiB Metadata, single: total=1.00GiB, used=450.08MiB GlobalReserve, single: total=215.41MiB, used=0.00B