Re: csum failed, bad tree, block, IO failures. Is my drive dead or has my BTRFS broke itself?
From: Qu Wenruo <hidden>
Date: 2021-10-16 03:30:27
On 2021/10/16 11:18, James Harvey wrote:
I have attached the full journalctl from the boot where this first happened. Note that this happened again after a scrub and a reboot during a different write operation. I'm currently doing a backup (not overwriting any of my other backups), so I will do a memory test to see if I have bad RAM. I don't have ECC memory so I can't easily check.
With the full dmesg, it's much clear how corrupted the fs is: > kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 12255248384 csum 0xd6230a4c expected csum 0x723d189a mirror 1 Previous error are mostly data corruption. So far still no idea how corrupted/what's going wrong. But the next ones give us quite some clue: > BTRFS error (device sdb1): bad tree block start, want 9344471629824 have 5162927840984877996 The bytenr we got if completely garbage. This means, some (in fact quite some) metadata blocks are completely overwritten with garbage or whatever. Considering the context, it looks like csum tree got some big corruption. And it's not a common symptom of memory bitflip, but really corrupted data on-disk. And btrfs-check should detect such problem, if not, you can try "btrfs check --check-data-csum" which should throw tons of corruption. I have no idea how could this happen, maybe disk corruption, or maybe some other problems. Thanks, Qu
On Sat, 16 Oct 2021 at 02:52, Qu Wenruo [off-list ref] wrote:quoted
On 2021/10/16 08:14, James Harvey wrote:quoted
My server consists of a single 16TB external drive (I have backups, and I was planning to make a proper server at some point) and I used BTRFS for the drive's filesystem. Recently, the file system would go into read only and put a load of errors into the system logs. Running a BTRFS scrub returned no errors, a readonly BTRFS check returned no errors, and a SMART check showed no issues/bad sectors.This is very strange, as normally if there is really on-disk corruption, especially in metadata, btrfs check should detect it.quoted
Has BTRFS broke itself or is this a drive issue: Here are the errors:Could you please provide the full dmesg? We want the context to see get a whole picture of the problem, not only just error messages from btrfs. If the problem only happens at write time, maybe you want to do a memory test to verify it's not some bitflip in your memory in the mean time. Thanks, Ququoted
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105460736 csum 0x75ab540e expected csum 0xaeb99694 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105464832 csum 0xe83b4c2a expected csum 0xb9a65172 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105468928 csum 0x4769b37a expected csum 0x3598cf9e mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105473024 csum 0x7c39a990 expected csum 0x9c523a6c mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105477120 csum 0xfedc09f1 expected csum 0x68386e9a mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105481216 csum 0xf9f25835 expected csum 0x96d2dea3 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105485312 csum 0x37643155 expected csum 0x6139f8a1 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105489408 csum 0x13893c06 expected csum 0xb28c00a8 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105493504 csum 0x2a89fcff expected csum 0x4c5758ed mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off 14105497600 csum 0x7484b77c expected csum 0x0a9f3138 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343812173824 have 9856732008096476660 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343806013440 have 757116834938933 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343812173824 have 9856732008096476660 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9622003011584, 9622003015680) Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343806013440 have 757116834938933 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343812173824 have 9856732008096476660 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343812173824 have 9856732008096476660 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9622003015680, 9622003019776) Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343947784192 have 17536680014548819927 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343812173824 have 9856732008096476660 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343947784192 have 17536680014548819927 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9644356001792, 9644356005888) Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad tree block start, want 9343812173824 have 9856732008096476660 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9622003019776, 9622003023872) Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9644356005888, 9644356009984) Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9622003023872, 9622003027968) Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9633973551104, 9633973555200) Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9644356009984, 9644356014080) Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9622003027968, 9622003032064) Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum hole found for disk bytenr range [9633973555200, 9633973559296) Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected csum 0xc096fec5 mirror 1 Oct 14 21:50:41 James-Server kernel: BTRFS: error (device sdb1) in btrfs_finish_ordered_io:3064: errno=-5 IO failure Oct 14 21:50:41 James-Server kernel: BTRFS info (device sdb1): forced readonly uname -a: Linux James-Server 5.14.11-arch1-1 #1 SMP PREEMPT Sun, 10 Oct 2021 00:48:26 +0000 x86_64 GNU/Linux btrfs --version: btrfs-progs v5.14.2 btrfs fi show: Label: 'Seagate 16TB 1' uuid: e183a876-95e0-4d15-a641-69f4a8e8e7e7 Total devices 1 FS bytes used 9.61TiB devid 1 size 14.55TiB used 9.62TiB path /dev/sdb1 btrfs fi df: Data, single: total=9.60TiB, used=9.60TiB System, DUP: total=8.00MiB, used=1.09MiB Metadata, DUP: total=11.00GiB, used=10.74GiB GlobalReserve, single: total=512.00MiB, used=0.00B Mount options: rw,noatime,compress=zstd:3,space_cache=v2,autodefrag,subvolid=5,subvol=/