Re: [PATCH 2/2] net: mana: Add standard counter rx_missed_errors
From: Erni Sri Satya Vennela <hidden>
Date: 2025-10-23 05:31:36
Also in:
linux-hyperv, linux-rdma, lkml
From: Erni Sri Satya Vennela <hidden>
Date: 2025-10-23 05:31:36
Also in:
linux-hyperv, linux-rdma, lkml
On Tue, Sep 16, 2025 at 03:22:54PM +0200, Paolo Abeni wrote:
On 9/15/25 5:58 AM, Erni Sri Satya Vennela wrote:quoted
Report standard counter stats->rx_missed_errors using hc_rx_discards_no_wqe from the hardware. Add a dedicated workqueue to periodically run mana_query_gf_stats every 2 seconds to get the latest info in eth_stats and define a driver capability flag to notify hardware of the periodic queries. To avoid repeated failures and log flooding, the workqueue is not rescheduled if mana_query_gf_stats fails.Can the failure root cause be a "transient" one? If so, this looks like a dangerous strategy; is such scenario, AFAICS, stats will be broken until the device is removed and re-probed. /P
After internal discussion, We are planning to fix this issue following the below approach: Stop rescheduling the work queue only upon detecting HWC timeout. In this case: 1. Reset all stats to zero to avoid stale reporting. 2. Introduce a driver flag to detect the first occurrence of HWC timeout. 3. Log a warn_once during subsequent calls to mana_get_stats64 to signal the issue. Thanks, Vennela