Thread (18 messages) 18 messages, 5 authors, 2026-01-26

Re: [PATCH net-next 1/3] net: ethtool: Track pause storm events

From: Jakub Kicinski <kuba@kernel.org>
Date: 2026-01-23 22:15:29

On Fri, 23 Jan 2026 22:27:19 +0100 Oleksij Rempel wrote:
quoted
+      -
+        name: tx-pause-storm-events
+        type: u64
+        doc: >-
+            TX pause storm event count. Increments each time device
+            detects that its pause assertion condition has been true
+            for too long for normal operation. As a result, the device
+            has temporarily disabled its own Pause TX function to
+            protect the network from itself.
+            This counter should never increment under normal overload
+            conditions; it indicates catastrophic failure like an OS
+            crash. The rate of incrementing is implementation specific.  
Hm, we already have the tx pause frame counters. So, the anomaly is
visible to the user anyway (even if it isn't explicitly labeled as an
anomaly).
We are trying to prove a negative here, that's why we need a new
counter. As the doc says this counter should indicate that storm
is never actually detected under normal conditions. Another thing
to keep in mind is that we're talking about metric collection at scale,
so every 1min to 5min.
What is not visible to the user is when HW or SW disables flow control.
Maybe that is what the counter should represent and be named? Would
tx-pause-auto-disabled-events make sense?
According to our existing uAPI for PFC pause storm is the term of art.
The reason I do not like tx-pause-storm-events is that the meaning is
device specific; the user has to read the device manual to know what it
actually means.

tx-pause-auto-disabled-events can be reused in more cases - every time
we try to pause flow control for some reason.
TBH I feel like you may be overestimating your ability to do anything
like that in the SW here. The silicon can do this cycle-accurate, FIFO
pressure never relieved. In SW you have to poll, and if you can poll
why not just read the packets from the fifo and let the pipe move?

On the "device manual" point, pause frames as an estimate of congestion
are also quite useless device to device. You have to "read the manual".
Different devices use different pause quanta so to speak.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help