Re: "bad tree block start, want 419774464 have 0" after a clean shutdown, could it be a disk firmware issue?
From: David Sterba <hidden>
Date: 2021-07-21 17:47:17
On Fri, Jul 16, 2021 at 11:44:21PM +0100, Jorge Bastos wrote:
Hi, This was a single disk filesystem, DUP metadata, and this week it stop mounting out of the blue, the data is not a concern since I have a full fs snapshot in another server, just curious why this happened, I remember reading that some WD disks have firmware with write caches issues, and I believe this disk is affected: Model family:Western Digital Green Device model:WDC WD20EZRX-00D8PB0 Firmware version:80.00A80
For the record summing up the discussion from IRC with Zygo, this particular firmware 80.00A80 on WD Green is known to have problematic firmware and would explain the observed errors. Recommendation is not to use WD Green or periodically disable the write cache by 'hdparm -W0'.
SMART looks mostly OK, except "Raw read error rate" is high, which in my experience is never a good sign on these disks, but I didn't get any read errors so far, also no unclean shutdown, it was working normally last time I mounted it, and after a clean shutdown, probably just after deleting some snapshots, I now get this: Jul 16 23:27:38 TV1 emhttpd: shcmd (129): mount -t btrfs -o noatime,nodiratime /dev/md20 /mnt/disk20 Jul 16 23:27:38 TV1 kernel: BTRFS info (device md20): using free space tree Jul 16 23:27:38 TV1 kernel: BTRFS info (device md20): has skinny extents Jul 16 23:27:38 TV1 kernel: BTRFS error (device md20): bad tree block start, want 419774464 have 0
When the 'have' values are zeros it means the blocks were empty so eg. trimmed, or not written at all.
Jul 16 23:27:38 TV1 kernel: BTRFS error (device md20): bad tree block start, want 419774464 have 0 Jul 16 23:27:38 TV1 kernel: BTRFS warning (device md20): failed to read root (objectid=2): -5 Kernel is kind of old, 4.19.107, but there are 21 more btrfs file systems on this server, some using identical disks and no issues for a long time until now, btrfs check output: ~# btrfs check /dev/md20 Opening filesystem to check... checksum verify failed on 419774464 found 000000B6 wanted 00000000 checksum verify failed on 419774464 found 00000058 wanted 00000000 checksum verify failed on 419774464 found 000000B6 wanted 00000000
^^^^^^^^ This is an artifact of incorrectly printed checksums, fixed in btrfs-progs v5.11.1
bad tree block 419774464, bytenr mismatch, want=419774464, have=0 ERROR: could not setup extent tree ERROR: cannot open file system Could this type of error be explained by a bad disk firmware?