Thread (4 messages) 4 messages, 4 authors, 2026-03-03

RE: [PATCH net-next, v2] net: mana: Trigger VF reset/recovery on health check failure due to HWC timeout

From: Haiyang Zhang <haiyangz@microsoft.com>
Date: 2026-02-27 19:24:58
Also in: linux-hyperv, linux-rdma, lkml

-----Original Message-----
From: Dipayaan Roy <redacted>
Sent: Friday, February 27, 2026 3:15 AM
To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
[off-list ref]; wei.liu@kernel.org; Dexuan Cui
[off-list ref]; andrew+netdev@lunn.ch; davem@davemloft.net;
edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; leon@kernel.org;
Long Li [off-list ref]; Konstantin Taranov
[off-list ref]; horms@kernel.org;
shradhagupta@linux.microsoft.com; ssengar@linux.microsoft.com;
ernis@linux.microsoft.com; Shiraz Saleem [off-list ref];
linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-
kernel@vger.kernel.org; linux-rdma@vger.kernel.org; Dipayaan Roy
[off-list ref]
Subject: [PATCH net-next, v2] net: mana: Trigger VF reset/recovery on
health check failure due to HWC timeout

The GF stats periodic query is used as mechanism to monitor HWC health
check. If this HWC command times out, it is a strong indication that
the device/SoC is in a faulty state and requires recovery.

Today, when a timeout is detected, the driver marks
hwc_timeout_occurred, clears cached stats, and stops rescheduling the
periodic work. However, the device itself is left in the same failing
state.

Extend the timeout handling path to trigger the existing MANA VF
recovery service by queueing a GDMA_EQE_HWC_RESET_REQUEST work item.
This is expected to initiate the appropriate recovery flow by suspende
resume first and if it fails then trigger a bus rescan.

This change is intentionally limited to HWC command timeouts and does
not trigger recovery for errors reported by the SoC as a normal command
response.

Signed-off-by: Dipayaan Roy <redacted>
---
Changes in v2:
  - Added common helper, proper clearing of gc flags.
---
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help