Re: bug: btrfs device stats not showing raid1 errors

From: waxhead <hidden>
Date: 2021-09-21 01:43:17

Chris Murphy wrote:

https://bugzilla.redhat.com/show_bug.cgi?id=2005987

Various kernel messages like this:

[2634355.709564] BTRFS info (device sda3): read error corrected: ino
27902168 off 8773632 (dev /dev/sda3 sector 52960104)
[2634355.733898] BTRFS info (device sda3): read error corrected: ino
27902168 off 8749056 (dev /dev/sda3 sector 52960056)

And yet 'btrfs dev stats' does not show an increment in tracked
statistics, in particular read_io_errs

This is extremely confusing for me as well and I am just a BTRFS user...
I am an BTRFS "enthusiast" if there is such a thing , and if this seems 
wrong (regardless if it is wrong or not) imagine the frustration and 
confusion for those not that into filesystems.

This does seem like suboptimal behavior.  Discussed a bit on IRC today
and Zygo found the behavior is introduced with commit 0cc068e6ee59
btrfs: don't report readahead errors and don't update statistics

Zygo on IRC writes:
readahead errors are things like "out of memory" or device-mapper nonsense
so the best is 'don't correct and don't count'
since there's probably nothing wrong with the underlying media
but if there is something wrong with the underlying media, we want a
proper read, correct, and count to happen
which means we can safely do nothing during readahead
so the right answer is don't correct and don't count
---

I'm not sure how noisy it could be to always report such read errors
discovered during read ahead, but my gut instinct is that anytime
there's a read error whether physical or virtual, we probably want to
know about this? If these are bogus errors then that suggests (a) do
not increment the dev stats counter, and also (b) do not fix up.

...And in case someone clears this up. Please consider a table output 
option like btrfs fi us -T /mnt ... e.g. btrfs de st -T /mnt that output 
something like

Device stat ErrWrite ErrRead ErrFlush ErrCorrupt ErrGen
----------- -------- ------- -------- ---------- ------
/dev/sdb1          0       1        2          0      3
/dev/sdt1          0       2        3          0      4
/dev/sdr1          0       3        4          0      6
/dev/sdf1          0       4        5          0      7
/dev/sds1          0       5        6          0      8

instead or in addition to...

[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  0
[/dev/sdb1].generation_errs  0
[/dev/sdt1].write_io_errs    0
[/dev/sdt1].read_io_errs     0
[/dev/sdt1].flush_io_errs    0
[/dev/sdt1].corruption_errs  0
[/dev/sdt1].generation_errs  0
...etc...

The current list that duplicates stuff takes up an awful lot of space if 
you have plenty of storage devices. I have 18 harddrives in a BTRFS pool 
and the btrfs de st list is annoyingly long...

A table would be nice , or simply SKIP printing the lines where the stat 
counter==0 as this simply is not needed.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help