Thread (21 messages) 21 messages, 5 authors, 2025-09-28

Re: [PATCH net-next v3 1/2] net/smc: make wr buffer count configurable

From: Halil Pasic <pasic@linux.ibm.com>
Date: 2025-09-26 10:13:02
Also in: linux-doc, linux-rdma, linux-s390, lkml

On Fri, 26 Sep 2025 10:44:00 +0800
Guangguan Wang [off-list ref] wrote:
quoted
+
+smcr_max_send_wr - INTEGER
+	So called work request buffers are SMCR link (and RDMA queue pair) level
+	resources necessary for performing RDMA operations. Since up to 255
+	connections can share a link group and thus also a link and the number
+	of the work request buffers is decided when the link is allocated,
+	depending on the workload it can a bottleneck in a sense that threads
+	have to wait for work request buffers to become available. Before the
+	introduction of this control the maximal number of work request buffers
+	available on the send path used to be hard coded to 16. With this control
+	it becomes configurable. The acceptable range is between 2 and 2048.
+
+	Please be aware that all the buffers need to be allocated as a physically
+	continuous array in which each element is a single buffer and has the size
+	of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much
+	like before having this control.
+
+	Default: 16
+
+smcr_max_recv_wr - INTEGER
+	So called work request buffers are SMCR link (and RDMA queue pair) level
+	resources necessary for performing RDMA operations. Since up to 255
+	connections can share a link group and thus also a link and the number
+	of the work request buffers is decided when the link is allocated,
+	depending on the workload it can a bottleneck in a sense that threads
+	have to wait for work request buffers to become available. Before the
+	introduction of this control the maximal number of work request buffers
+	available on the receive path used to be hard coded to 16. With this control
+	it becomes configurable. The acceptable range is between 2 and 2048.
+
+	Please be aware that all the buffers need to be allocated as a physically
+	continuous array in which each element is a single buffer and has the size
+	of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much
+	like before having this control.
+
+	Default: 48  
Notice that the ratio of smcr_max_recv_wr to smcr_max_send_wr is set to 3:1, with the
intention of ensuring that the peer QP's smcr_max_recv_wr is three times the local QP's
smcr_max_send_wr and the local QP's smcr_max_recv_wr is three times the peer QP's
smcr_max_send_wr, rather than making the local QP's smcr_max_recv_wr three times its own
smcr_max_send_wr. The purpose of this design is to guarantee sufficient receive WRs on
the side to receive incoming data when peer QP doing RDMA sends. Otherwise, RNR (Receiver
Not Ready) may occur, leading to poor performance(RNR will drop the packet and retransmit
happens in the transport layer of the RDMA).
Thank you Guangguan! I think we already had that discussion. 
Let us guess a scenario that have multiple hosts, and the multiple hosts have different
smcr_max_send_wr and smcr_max_recv_wr configurations, mesh connections between these hosts.
It is difficult to ensure that the smcr_max_recv_wr/smcr_max_send_wr is 3:1 on the connected
QPs between these hosts, and it may even be hard to guarantee the smcr_max_recv_wr > smcr_max_send_wr
on the connected QPs between these hosts.

It is not difficult IMHO. You just leave the knobs alone and you have
3:1 per default. If tuning is attempted that needs to be done carefully.
At least with SMC-R V2 there is this whole EID business, as well so it
is reasonable to assume that the environment can be tuned in a coherent
fashion. E.g. whoever is calling the EID could call use smcr_max_recv_wr:=32 and smcr_max_send_wr:=
Therefore, I believe that if these values are made configurable, additional mechanisms must be
in place to prevent RNR from occurring. Otherwise we need to carefully configure smcr_max_recv_wr
and smcr_max_send_wr, or ensure that all hosts capable of establishing SMC-R connections are configured
smcr_max_recv_wr and smcr_max_send_wr with the same values.
Thank you for 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help