Re: bug: btrfs device stats not showing raid1 errors
From: waxhead <hidden>
Date: 2021-09-21 01:43:17
Chris Murphy wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=2005987 Various kernel messages like this: [2634355.709564] BTRFS info (device sda3): read error corrected: ino 27902168 off 8773632 (dev /dev/sda3 sector 52960104) [2634355.733898] BTRFS info (device sda3): read error corrected: ino 27902168 off 8749056 (dev /dev/sda3 sector 52960056) And yet 'btrfs dev stats' does not show an increment in tracked statistics, in particular read_io_errs
This is extremely confusing for me as well and I am just a BTRFS user... I am an BTRFS "enthusiast" if there is such a thing , and if this seems wrong (regardless if it is wrong or not) imagine the frustration and confusion for those not that into filesystems.
This does seem like suboptimal behavior. Discussed a bit on IRC today and Zygo found the behavior is introduced with commit 0cc068e6ee59 btrfs: don't report readahead errors and don't update statistics Zygo on IRC writes: readahead errors are things like "out of memory" or device-mapper nonsense so the best is 'don't correct and don't count' since there's probably nothing wrong with the underlying media but if there is something wrong with the underlying media, we want a proper read, correct, and count to happen which means we can safely do nothing during readahead so the right answer is don't correct and don't count --- I'm not sure how noisy it could be to always report such read errors discovered during read ahead, but my gut instinct is that anytime there's a read error whether physical or virtual, we probably want to know about this? If these are bogus errors then that suggests (a) do not increment the dev stats counter, and also (b) do not fix up.
...And in case someone clears this up. Please consider a table output option like btrfs fi us -T /mnt ... e.g. btrfs de st -T /mnt that output something like Device stat ErrWrite ErrRead ErrFlush ErrCorrupt ErrGen ----------- -------- ------- -------- ---------- ------ /dev/sdb1 0 1 2 0 3 /dev/sdt1 0 2 3 0 4 /dev/sdr1 0 3 4 0 6 /dev/sdf1 0 4 5 0 7 /dev/sds1 0 5 6 0 8 instead or in addition to... [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sdt1].write_io_errs 0 [/dev/sdt1].read_io_errs 0 [/dev/sdt1].flush_io_errs 0 [/dev/sdt1].corruption_errs 0 [/dev/sdt1].generation_errs 0 ...etc... The current list that duplicates stuff takes up an awful lot of space if you have plenty of storage devices. I have 18 harddrives in a BTRFS pool and the btrfs de st list is annoyingly long... A table would be nice , or simply SKIP printing the lines where the stat counter==0 as this simply is not needed.