Re: [PATCH net-next v2 1/2] net/smc: make wr buffer count configurable
From: Guangguan Wang <hidden>
Date: 2025-09-24 03:13:09
Also in:
linux-doc, linux-rdma, linux-s390, lkml
在 2025/9/19 22:55, Halil Pasic 写道:
On Tue, 9 Sep 2025 12:18:50 +0200 Halil Pasic [off-list ref] wrote: Can maybe Wen Gu and Guangguan Wang chime in. From what I read link->wr_rx_buflen can be either SMC_WR_BUF_SIZE that is 48 in which case it does not matter, or SMC_WR_BUF_V2_SIZE that is 8192, if !smc_link_shared_v2_rxbuf(lnk) i.e. max_recv_sge == 1. So we talk about roughly a factor of 170 here. For a large pref_recv_wr the back of logic is still there to save us but I really would not say that this is how this is intended to work.
Hi Halil, I think the root cause of the problem this patchset try to solve is a mismatch between SMC_WR_BUF_CNT and the max_conns per lgr(which value is 255). Furthermore, I believe that value 255 of the max_conns per lgr is not an optimal value, as too few connections lead to a waste of memory and too many connections lead to I/O queuing within a single QP(every WR post_send to a single QP will initiate and complete in sequence). We actually identified this problem long ago. In Alibaba Cloud Linux distribution, we have changed SMC_WR_BUF_CNT to 64 and reduced max_conns per lgr to 32(for SMC-R V2.1). This configuration has worked well under various workflow for a long time. SMC-R V2.1 already support negotiation of the max_conns per lgr. Simply change the value of the macro SMC_CONN_PER_LGR_PREFER can influence the negotiation result. But SMC-R V1.0 and SMC-R v2.0 do not support the negotiation of the max_conns per lgr. I think it is better to reduce SMC_CONN_PER_LGR_PREFER for SMC-R V2.1. But for SMC-R V1.0 and SMC-R V2.0, I do not have any good idea.
Maybe not supporting V2 on devices with max_recv_sge is a better choice,
assuming that a maximal V2 LLC msg needs to fit each and every receive
WR buffer. Which seems to be the case based on 27ef6a9981fe ("net/smc:
support SMC-R V2 for rdma devices with max_recv_sge equals to 1").For rdma dev whose max_recv_sge is 1, as metioned in the commit log in the related patch, it is better to support than SMC_CLC_DECL_INTERR fallback, as SMC_CLC_DECL_INTERR fallback is not a fast fallback, and may heavily influence the efficiency of the connecting process in both the server and client side.
For me the best course of action seems to be to send a V3 using link->wr_rx_buflen. I'm really not that knowledgeable about RDMA or the SMC-R protocol, but I'm happy to be part of the discussion on this matter. Regards, Halil
And a tiny suggestion for the risk you mentioned in commit log
("Addressing this by simply bumping SMC_WR_BUF_CNT to 256 was deemed
risky, because the large-ish physically continuous allocation could fail
and lead to TCP fall-backs."). Non-physically continuous allocation (vmalloc/vzalloc .etc.) is
also supported for wr buffers. SMC-R snd_buf and rmb have already supported for non-physically
continuous memory, when sysctl_smcr_buf_type is set to SMCR_VIRT_CONT_BUFS or SMCR_MIXED_BUFS.
It can be an example of using non-physically continuous memory.
Regards,
Guangguan Wang