Re: [PATCH net-next 1/3] net: ethtool: Track pause storm events
From: Jakub Kicinski <kuba@kernel.org>
Date: 2026-01-23 22:15:29
On Fri, 23 Jan 2026 22:27:19 +0100 Oleksij Rempel wrote:
quoted
+ - + name: tx-pause-storm-events + type: u64 + doc: >- + TX pause storm event count. Increments each time device + detects that its pause assertion condition has been true + for too long for normal operation. As a result, the device + has temporarily disabled its own Pause TX function to + protect the network from itself. + This counter should never increment under normal overload + conditions; it indicates catastrophic failure like an OS + crash. The rate of incrementing is implementation specific.Hm, we already have the tx pause frame counters. So, the anomaly is visible to the user anyway (even if it isn't explicitly labeled as an anomaly).
We are trying to prove a negative here, that's why we need a new counter. As the doc says this counter should indicate that storm is never actually detected under normal conditions. Another thing to keep in mind is that we're talking about metric collection at scale, so every 1min to 5min.
What is not visible to the user is when HW or SW disables flow control. Maybe that is what the counter should represent and be named? Would tx-pause-auto-disabled-events make sense?
According to our existing uAPI for PFC pause storm is the term of art.
The reason I do not like tx-pause-storm-events is that the meaning is device specific; the user has to read the device manual to know what it actually means. tx-pause-auto-disabled-events can be reused in more cases - every time we try to pause flow control for some reason.
TBH I feel like you may be overestimating your ability to do anything like that in the SW here. The silicon can do this cycle-accurate, FIFO pressure never relieved. In SW you have to poll, and if you can poll why not just read the packets from the fifo and let the pipe move? On the "device manual" point, pause frames as an estimate of congestion are also quite useless device to device. You have to "read the manual". Different devices use different pause quanta so to speak.