Thread (8 messages) 8 messages, 4 authors, 2021-10-18

Re: csum failed, bad tree, block, IO failures. Is my drive dead or has my BTRFS broke itself?

From: Qu Wenruo <hidden>
Date: 2021-10-16 03:30:27


On 2021/10/16 11:18, James Harvey wrote:
I have attached the full journalctl from the boot where this first
happened. Note that this happened again after a scrub and a reboot
during a different write operation. I'm currently doing a backup (not
overwriting any of my other backups), so I will do a memory test to
see if I have bad RAM. I don't have ECC memory so I can't easily
check.
With the full dmesg, it's much clear how corrupted the fs is:

 > kernel: BTRFS warning (device sdb1): csum failed root 5 ino 97395 off
12255248384 csum 0xd6230a4c expected csum 0x723d189a mirror 1

Previous error are mostly data corruption.

So far still no idea how corrupted/what's going wrong.

But the next ones give us quite some clue:

 > BTRFS error (device sdb1): bad tree block start, want 9344471629824
have 5162927840984877996

The bytenr we got if completely garbage.

This means, some (in fact quite some) metadata blocks are completely
overwritten with garbage or whatever.

Considering the context, it looks like csum tree got some big corruption.

And it's not a common symptom of memory bitflip, but really corrupted
data on-disk.

And btrfs-check should detect such problem, if not, you can try "btrfs
check --check-data-csum" which should throw tons of corruption.

I have no idea how could this happen, maybe disk corruption, or maybe
some other problems.

Thanks,
Qu
On Sat, 16 Oct 2021 at 02:52, Qu Wenruo [off-list ref] wrote:
quoted


On 2021/10/16 08:14, James Harvey wrote:
quoted
My server consists of a single 16TB external drive (I have backups,
and I was planning to make a proper server at some point) and I used
BTRFS for the drive's filesystem. Recently, the file system would go
into read only and put a load of errors into the system logs. Running
a BTRFS scrub returned no errors, a readonly BTRFS check returned no
errors, and a SMART check showed no issues/bad sectors.
This is very strange, as normally if there is really on-disk corruption,
especially in metadata, btrfs check should detect it.
quoted
Has BTRFS
broke itself or is this a drive issue:

Here are the errors:
Could you please provide the full dmesg?

We want the context to see get a whole picture of the problem, not only
just error messages from btrfs.

If the problem only happens at write time, maybe you want to do a memory
test to verify it's not some bitflip in your memory in the mean time.

Thanks,
Qu
quoted
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105460736 csum 0x75ab540e expected csum
0xaeb99694 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105464832 csum 0xe83b4c2a expected csum
0xb9a65172 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105468928 csum 0x4769b37a expected csum
0x3598cf9e mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105473024 csum 0x7c39a990 expected csum
0x9c523a6c mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105477120 csum 0xfedc09f1 expected csum
0x68386e9a mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105481216 csum 0xf9f25835 expected csum
0x96d2dea3 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105485312 csum 0x37643155 expected csum
0x6139f8a1 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105489408 csum 0x13893c06 expected csum
0xb28c00a8 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105493504 csum 0x2a89fcff expected csum
0x4c5758ed mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 97395 off 14105497600 csum 0x7484b77c expected csum
0x0a9f3138 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bdev
/dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343812173824 have 9856732008096476660
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343806013440 have 757116834938933
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343812173824 have 9856732008096476660
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9622003011584, 9622003015680)
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343806013440 have 757116834938933
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343812173824 have 9856732008096476660
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343812173824 have 9856732008096476660
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9622003015680, 9622003019776)
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343947784192 have 17536680014548819927
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343812173824 have 9856732008096476660
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343947784192 have 17536680014548819927
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9644356001792, 9644356005888)
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS error (device sdb1): bad
tree block start, want 9343812173824 have 9856732008096476660
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9622003019776, 9622003023872)
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9644356005888, 9644356009984)
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9622003023872, 9622003027968)
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9633973551104, 9633973555200)
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9644356009984, 9644356014080)
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9622003027968, 9622003032064)
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
hole found for disk bytenr range [9633973555200, 9633973559296)
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:37 James-Server kernel: BTRFS warning (device sdb1): csum
failed root 5 ino 173568 off 3875945435136 csum 0x23ed6941 expected
csum 0xc096fec5 mirror 1
Oct 14 21:50:41 James-Server kernel: BTRFS: error (device sdb1) in
btrfs_finish_ordered_io:3064: errno=-5 IO failure
Oct 14 21:50:41 James-Server kernel: BTRFS info (device sdb1): forced readonly

uname -a: Linux James-Server 5.14.11-arch1-1 #1 SMP PREEMPT Sun, 10
Oct 2021 00:48:26 +0000 x86_64 GNU/Linux

btrfs --version: btrfs-progs v5.14.2

btrfs fi show:

Label: 'Seagate 16TB 1'  uuid: e183a876-95e0-4d15-a641-69f4a8e8e7e7
         Total devices 1 FS bytes used 9.61TiB
         devid    1 size 14.55TiB used 9.62TiB path /dev/sdb1

btrfs fi df:

Data, single: total=9.60TiB, used=9.60TiB
System, DUP: total=8.00MiB, used=1.09MiB
Metadata, DUP: total=11.00GiB, used=10.74GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Mount options: rw,noatime,compress=zstd:3,space_cache=v2,autodefrag,subvolid=5,subvol=/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help