Thread (11 messages) 11 messages, 5 authors, 2021-11-22

Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?

From: David Sterba <hidden>
Date: 2021-07-21 17:47:17

On Fri, Jul 16, 2021 at 11:44:21PM +0100, Jorge Bastos wrote:
Hi,

This was a single disk filesystem, DUP metadata, and this week it stop
mounting out of the blue, the data is not a concern since I have a
full fs snapshot in another server, just curious why this happened, I
remember reading that some WD disks have firmware with write caches
issues, and I believe this disk is affected:

Model family:Western Digital Green
Device model:WDC WD20EZRX-00D8PB0
Firmware version:80.00A80
For the record summing up the discussion from IRC with Zygo, this
particular firmware 80.00A80 on WD Green is known to have problematic
firmware and would explain the observed errors.

Recommendation is not to use WD Green or periodically disable the write
cache by 'hdparm -W0'.
SMART looks mostly OK, except "Raw read error rate" is high, which in
my experience is never a good sign on these disks, but I didn't get
any read errors so far, also no unclean shutdown, it was working
normally last time I mounted it, and after a clean shutdown, probably
just after deleting some snapshots, I now get this:

Jul 16 23:27:38 TV1 emhttpd: shcmd (129): mount -t btrfs -o
noatime,nodiratime /dev/md20 /mnt/disk20
Jul 16 23:27:38 TV1 kernel: BTRFS info (device md20): using free space tree
Jul 16 23:27:38 TV1 kernel: BTRFS info (device md20): has skinny extents
Jul 16 23:27:38 TV1 kernel: BTRFS error (device md20): bad tree block
start, want 419774464 have 0
When the 'have' values are zeros it means the blocks were empty so eg.
trimmed, or not written at all.
Jul 16 23:27:38 TV1 kernel: BTRFS error (device md20): bad tree block
start, want 419774464 have 0
Jul 16 23:27:38 TV1 kernel: BTRFS warning (device md20): failed to
read root (objectid=2): -5

Kernel is kind of old, 4.19.107, but there are 21 more btrfs file
systems on this server, some using identical disks and no issues for a
long time until now, btrfs check output:

~# btrfs check /dev/md20
Opening filesystem to check...
checksum verify failed on 419774464 found 000000B6 wanted 00000000
checksum verify failed on 419774464 found 00000058 wanted 00000000
checksum verify failed on 419774464 found 000000B6 wanted 00000000
                                            ^^^^^^^^

This is an artifact of incorrectly printed checksums, fixed in
btrfs-progs v5.11.1
bad tree block 419774464, bytenr mismatch, want=419774464, have=0
ERROR: could not setup extent tree
ERROR: cannot open file system

Could this type of error be explained by a bad disk firmware?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help