Thread (4 messages) 4 messages, 4 authors, 2025-09-30

Re: [PATCH net-next v7 1/1] Documentation: net: add flow control guide and document ethtool API

From: Jakub Kicinski <kuba@kernel.org>
Date: 2025-09-27 00:19:23
Also in: lkml

On Wed, 24 Sep 2025 14:02:41 +0200 Oleksij Rempel wrote:
     name: pause-stat
+    doc: Statistics counters for link-wide PAUSE frames (IEEE 802.3 Annex 31B).
     attr-cnt-name: __ethtool-a-pause-stat-cnt
+    enum-name: ethtool-a-pause-stat
Naming attribute enums is relatively rare and kinda unnecessary TBH,
because the values are almost never held as state or passed around.
99.9% of the time we use the literals.

enums for actual enum attributes (the value is the enum) - sure,
enums for attr types - 🤷️
         name: stats
+        doc: |
+          Contains the pause statistics counters. The source of these
+          statistics is determined by stats-src.
I'd skip mentioning the source here TBH. Or we need to describe what
the MM is, shortly? I don't have recent embedded experience but I
thought MM is relatively rare. So mentioning it for a very common 
attribute could confuse.
quoted hunk ↗ jump to hunk
         type: nest
         nested-attributes: pause-stat
       -
         name: stats-src
+        doc: |
+          Selects the source of the MAC statistics, values from
+          enum ethtool_mac_stats_src. This allows requesting statistics
+          from the individual components of the MAC Merge layer.
         type: u32
   -
     name: eee
diff --git a/Documentation/networking/flow_control.rst b/Documentation/networking/flow_control.rst
new file mode 100644
index 000000000000..48646d54513f
--- /dev/null
+++ b/Documentation/networking/flow_control.rst
@@ -0,0 +1,373 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _ethernet-flow-control:
+
+=====================
+Ethernet Flow Control
+=====================
+
+This document is a practical guide to Ethernet Flow Control in Linux, covering
+what it is, how it works, and how to configure it.
+
+What is Flow Control?
+=====================
+
+Flow control is a mechanism to prevent a fast sender from overwhelming a
+slow receiver with data, which would cause buffer overruns and dropped packets.
+The receiver can signal the sender to temporarily stop transmitting, giving it
+time to process its backlog.
+
+Standards references
+====================
+
+Ethernet flow control mechanisms are specified across consolidated IEEE base
nit:        Flow Control ?  we should be consistent
+standards; some originated as amendments:
+
+- Collision-based flow control is part of CSMA/CD in **IEEE 802.3**
+  (half-duplex).
+- Link-wide PAUSE is defined in **IEEE 802.3 Annex 31B**
+  (originally **802.3x**).
+- Priority-based Flow Control (PFC) is defined in **IEEE 802.1Q Clause 36**
+  (originally **802.1Qbb**).
+
+In the remainder of this document, the consolidated clause numbers are used.
+
+How It Works: The Mechanisms
+============================
+
+The method used for flow control depends on the link's duplex mode.
+
+.. note::
+   The user-visible ``ethtool`` pause API described in this document controls
+   **link-wide PAUSE** (IEEE 802.3 Annex 31B) only. It does not control the
+   collision-based behavior that exists on half-duplex links.
 ... or PFC ?
+1. Half-Duplex: Collision-Based Flow Control
+--------------------------------------------
+On half-duplex links, a device cannot send and receive simultaneously, so PAUSE
+frames are not used. Flow control is achieved by leveraging the CSMA/CD
+(Carrier Sense Multiple Access with Collision Detection) protocol itself.
+
+* **How it works**: To inhibit incoming data, a receiving device can force a
+  collision on the line. When the sending station detects this collision, it
+  terminates its transmission, sends a "jam" signal, and then executes the
+  "Collision backoff and retransmission" procedure as defined in IEEE 802.3,
+  Section 4.2.3.2.5. This algorithm makes the sender wait for a random
+  period before attempting to retransmit. By repeatedly forcing collisions,
+  the receiver can effectively throttle the sender's transmission rate.
+
+.. note::
+    While this mechanism is part of the IEEE standard, there is currently no
+    generic kernel API to configure or control it. Drivers should not enable
+    this feature until a standardized interface is available.
+
+.. warning::
+   On shared-medium networks (e.g. 10BASE2, or twisted-pair networks using a
+   hub rather than a switch) forcing collisions inhibits traffic **across the
+   entire shared segment**, not just a single point-to-point link. Enabling
+   such behavior is generally undesirable.
+
+2. Full-Duplex: Link-wide PAUSE (IEEE 802.3 Annex 31B)
+------------------------------------------------------
+On full-duplex links, devices can send and receive at the same time. Flow
+control is achieved by sending a special **PAUSE frame**, defined by IEEE
+802.3 Annex 31B. This mechanism pauses all traffic on the link and is therefore
+called *link-wide PAUSE*.
+
+* **What it is**: A standard Ethernet frame with a globally reserved
+  destination MAC address (``01-80-C2-00-00-01``). This address is in a range
+  that standard IEEE 802.1D-compliant bridges do not forward. However, some
+  unmanaged or misconfigured bridges have been reported to forward these
+  frames, which can disrupt flow control across a network.
+
+* **How it works**: The frame contains a MAC Control opcode for PAUSE
+  (``0x0001``) and a ``pause_time`` value, telling the sender how long to
+  wait before sending more data frames. This time is specified in units of
+  "pause quantum", where one quantum is the time it takes to transmit 512 bits.
+  For example, one pause quantum is 51.2 microseconds on a 10 Mbit/s link,
+  and 512 nanoseconds on a 1 Gbit/s link. A ``pause_time`` of zero indicates
+  that the transmitter can resume transmission, even if a previous non-zero
+  pause time has not yet elapsed.
+
+* **Who uses it**: Any full-duplex link, from 10 Mbit/s to multi-gigabit speeds.
+
+3. Full-Duplex: Priority-based Flow Control (PFC) (IEEE 802.1Q Clause 36)
+-------------------------------------------------------------------------
+Priority-based Flow Control is an enhancement to the standard PAUSE mechanism
+that allows flow control to be applied independently to different classes of
+traffic, identified by their priority level.
should we add .. specified in the 802.1Q VLAN tag ?
+
+* **What it is**: PFC allows a receiver to pause traffic for one or more of the
+  8 standard priority levels without stopping traffic for other priorities.
+  This is critical in data center environments for protocols that cannot
+  tolerate packet loss due to congestion (e.g., Fibre Channel over Ethernet
+  or RoCE).
nit: either

 FCoE and RoCE 
   or
 Fibre Channel .. and RDMA over Converged ..

?
+* **How it works**: PFC uses a specific PAUSE frame format. It shares the same
+  globally reserved destination MAC address (``01-80-C2-00-00-01``) as legacy
+  PAUSE frames but uses a unique opcode (``0x0101``). The frame payload
+  contains two key fields:
+Kernel Policy: "Set and Trust"
+==============================
+
+The ethtool pause API is defined as a **wish policy** for
+IEEE 802.3 link-wide PAUSE only. A user request is always accepted
+as the preferred configuration, but it may not be possible to apply
+it in all link states.
+
+Key constraints:
+
+- Link-wide PAUSE is not valid on half-duplex links.
+- Link-wide PAUSE cannot be used together with Priority-based Flow Control
+  (PFC, IEEE 802.1Q Clause 36).
+- If autonegotiation is active and the link is currently down, the future
+  mode is not yet known.
+
+Because of these constraints, the kernel stores the requested setting
+and applies it only when the link is in a compatible state.
+
+Implications for userspace:
+
+1. Set once (the "wish"): the requested Rx/Tx PAUSE policy is
+   remembered even if it cannot be applied immediately.
+2. Applied conditionally: when the link comes up, the kernel enables
+   PAUSE only if the active mode allows it.
IDK about this section and also ...
quoted hunk ↗ jump to hunk
 Keeping Close Tabs on the PAL
 =============================
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index c869b7f8bce8..1f121108f236 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -931,9 +931,48 @@ struct kernel_ethtool_ts_info {
  * @get_pause_stats: Report pause frame statistics. Drivers must not zero
  *	statistics which they don't report. The stats structure is initialized
  *	to ETHTOOL_STAT_NOT_SET indicating driver does not report statistics.
- * @get_pauseparam: Report pause parameters
- * @set_pauseparam: Set pause parameters.  Returns a negative error code
- *	or zero.
+ *
+ * @get_pauseparam: Report the configured policy for link-wide PAUSE
+ *      (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
+ *      such that:
+ *      @autoneg:
+ *              This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
+ *              and is independent of generic link autonegotiation configured
+ *              via ethtool -s.
+ *              true  -> the device follows the negotiated result of pause
+ *                       autonegotiation (Pause/Asym);
+ *              false -> the device uses a forced MAC state independent of
+ *                       negotiation.
+ *      @rx_pause/@tx_pause:
+ *              represent the desired policy (preferred configuration).
+ *              In autoneg mode they describe what is to be advertised;
... this. IDK what you guys do in the Linux-managed code but the
convention for integrated devices is spelled out here:

/**
 * struct ethtool_pauseparam - Ethernet pause (flow control) parameters
 * @cmd: Command number = %ETHTOOL_GPAUSEPARAM or %ETHTOOL_SPAUSEPARAM
 * @autoneg: Flag to enable autonegotiation of pause frame use
 * @rx_pause: Flag to enable reception of pause frames
 * @tx_pause: Flag to enable transmission of pause frames
 *
 * Drivers should reject a non-zero setting of @autoneg when             <<< [1]
 * autoneogotiation is disabled (or not supported) for the link.         <<<
 *
 * If the link is autonegotiated, drivers should use
 * mii_advertise_flowctrl() or similar code to set the advertised
 * pause frame capabilities based on the @rx_pause and @tx_pause flags,
 * even if @autoneg is zero.  They should also allow the advertised
 * pause frame capabilities to be controlled directly through the
 * advertising field of &struct ethtool_cmd.
 *
 * If @autoneg is non-zero, the MAC is configured to send and/or
 * receive pause frames according to the result of autonegotiation.
 * Otherwise, it is configured directly based on the @rx_pause and
 * @tx_pause flags.
 */

Doesn't [1] contradict your description of kernel "storing the config"?
Also you're not reflecting this in the help for the set op..
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help