Thread (5 messages) 5 messages, 4 authors, 2024-08-23

RE: [PATCH net] net: mana: Fix race of mana_hwc_post_rx_wqe and new hwc response

From: Haiyang Zhang <haiyangz@microsoft.com>
Date: 2024-08-22 14:30:27
Also in: bpf, linux-hyperv, linux-rdma, lkml, stable

-----Original Message-----
From: Christophe JAILLET <redacted>
Sent: Thursday, August 22, 2024 1:34 AM
To: Haiyang Zhang <haiyangz@microsoft.com>
Cc: ast@kernel.org; bpf@vger.kernel.org; daniel@iogearbox.net;
davem@davemloft.net; Dexuan Cui [off-list ref];
edumazet@google.com; hawk@kernel.org; jesse.brandeburg@intel.com;
john.fastabend@gmail.com; kuba@kernel.org; KY Srinivasan
[off-list ref]; leon@kernel.org; linux-hyperv@vger.kernel.org; linux-
kernel@vger.kernel.org; linux-rdma@vger.kernel.org; Long Li
[off-list ref]; netdev@vger.kernel.org; olaf@aepfle.de;
pabeni@redhat.com; Paul Rosswurm [off-list ref];
shradhagupta@linux.microsoft.com; ssengar@linux.microsoft.com;
stable@vger.kernel.org; stephen@networkplumber.org; tglx@linutronix.de;
vkuznets@redhat.com; wei.liu@kernel.org
Subject: Re: [PATCH net] net: mana: Fix race of mana_hwc_post_rx_wqe and
new hwc response

Le 21/08/2024 à 22:42, Haiyang Zhang a écrit :
quoted
The mana_hwc_rx_event_handler() / mana_hwc_handle_resp() calls
complete(&ctx->comp_event) before posting the wqe back. It's
possible that other callers, like mana_create_txq(), start the
next round of mana_hwc_send_request() before the posting of wqe.
And if the HW is fast enough to respond, it can hit no_wqe error
on the HW channel, then the response message is lost. The mana
driver may fail to create queues and open, because of waiting for
the HW response and timed out.
Sample dmesg:
[  528.610840] mana 39d4:00:02.0: HWC: Request timed out!
[  528.614452] mana 39d4:00:02.0: Failed to send mana message: -110, 0x0
[  528.618326] mana 39d4:00:02.0 enP14804s2: Failed to create WQ object:
-110
quoted
To fix it, move posting of rx wqe before complete(&ctx->comp_event).

Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure
Network Adapter (MANA)")
quoted
Signed-off-by: Haiyang Zhang <haiyangz-
0li6OtcxBFHby3iVrkZq2A@public.gmane.org>
quoted
---
  .../net/ethernet/microsoft/mana/hw_channel.c  | 62 ++++++++++---------
  1 file changed, 34 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
b/drivers/net/ethernet/microsoft/mana/hw_channel.c
quoted
index cafded2f9382..a00f915c5188 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -52,9 +52,33 @@ static int mana_hwc_verify_resp_msg(const struct
hwc_caller_ctx *caller_ctx,
quoted
  	return 0;
  }

+static int mana_hwc_post_rx_wqe(const struct hwc_wq *hwc_rxq,
+				struct hwc_work_request *req)
+{
+	struct device *dev = hwc_rxq->hwc->dev;
+	struct gdma_sge *sge;
+	int err;
+
+	sge = &req->sge;
+	sge->address = (u64)req->buf_sge_addr;
+	sge->mem_key = hwc_rxq->msg_buf->gpa_mkey;
+	sge->size = req->buf_len;
+
+	memset(&req->wqe_req, 0, sizeof(struct gdma_wqe_request));
+	req->wqe_req.sgl = sge;
+	req->wqe_req.num_sge = 1;
+	req->wqe_req.client_data_unit = 0;
Hi,

unrelated to your patch, but this initialization is useless, it is
already memset(0)'ed a few lines above.
So why client_data_unit and not some other fields?
Agreed. This patch just moves the function for the bug fix.
We will do code cleanups in other patches.

Thanks,
- Haiyang
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help