Thread (21 messages) 21 messages, 5 authors, 2025-09-28

Re: [PATCH net-next v3 1/2] net/smc: make wr buffer count configurable

From: Guangguan Wang <hidden>
Date: 2025-09-28 03:05:55
Also in: linux-doc, linux-rdma, linux-s390, lkml


在 2025/9/26 18:30, Halil Pasic 写道:
On Fri, 26 Sep 2025 12:12:49 +0200
Halil Pasic [off-list ref] wrote:
quoted
On Fri, 26 Sep 2025 10:44:00 +0800
Guangguan Wang [off-list ref] wrote:
quoted
Notice that the ratio of smcr_max_recv_wr to smcr_max_send_wr is set to 3:1, with the
intention of ensuring that the peer QP's smcr_max_recv_wr is three times the local QP's
smcr_max_send_wr and the local QP's smcr_max_recv_wr is three times the peer QP's
smcr_max_send_wr, rather than making the local QP's smcr_max_recv_wr three times its own
smcr_max_send_wr. The purpose of this design is to guarantee sufficient receive WRs on
the side to receive incoming data when peer QP doing RDMA sends. Otherwise, RNR (Receiver
Not Ready) may occur, leading to poor performance(RNR will drop the packet and retransmit
happens in the transport layer of the RDMA).  
Sorry this was sent accidentally by the virtue of unintentionally
pressing the shortcut for send while trying to actually edit! 
quoted
Thank you Guangguan! I think we already had that discussion. 
Please have a look at this thread
https://lore.kernel.org/all/4c5347ff-779b-48d7-8234-2aac9992f487@linux.ibm.com/ (local)

I'm aware of this, but I think this problem needs to be solved on
a different level.
Oh, I see. Sorry for missing the previous discussion.

BTW, the RNR counter is the file like '/sys/class/infiniband/mlx5_0/ports/1/hw_counters/rnr_nak_retry_err'.
quoted
quoted
Let us guess a scenario that have multiple hosts, and the multiple hosts have different
smcr_max_send_wr and smcr_max_recv_wr configurations, mesh connections between these hosts.
It is difficult to ensure that the smcr_max_recv_wr/smcr_max_send_wr is 3:1 on the connected
QPs between these hosts, and it may even be hard to guarantee the smcr_max_recv_wr > smcr_max_send_wr
on the connected QPs between these hosts.  

It is not difficult IMHO. You just leave the knobs alone and you have
[..]

It is not difficult IMHO. You just leave the knobs alone and you have
3:1 per default. If tuning is attempted that needs to be done carefully.
At least with SMC-R V2 there is this whole EID business, as well so it
is reasonable to assume that the environment can be tuned in a coherent
fashion. E.g. whoever is calling the EID could call use smcr_max_recv_wr:=32
and smcr_max_send_wr:=96. 
quoted
quoted
Therefore, I believe that if these values are made configurable, additional mechanisms must be
in place to prevent RNR from occurring. Otherwise we need to carefully configure smcr_max_recv_wr
and smcr_max_send_wr, or ensure that all hosts capable of establishing SMC-R connections are configured
smcr_max_recv_wr and smcr_max_send_wr with the same values.  
I'm in favor of adding such mechanisms on top of this. Do you have
something particular in mind? Unfortunately I'm not knowledgeable enough
in the area to know what mechanisms you may mean. But I guess it is
patches welcome as always! Currently I would encourage to users
to tune carefully. 
AFAIK, flow control is a usual way, maybe credit-based flow control is enough. Credit means the valid
counts of receive wr can be used. The receiver counts the credit every time post_recv, and advertises
credits to the connected sender at a certain frequency. The sender counts the credits advertised from
peer. The sender consumes a credit everytime post_send wr which will consume a receive wr in the receiver,
if have enough credits to consume. Otherwise the sender should hang the wr and should wait for the credits
advertised from peer. 

But this requires support at the SMC-R protocol level. And this also can be addressed as an enhancement.
I do not known if someone from Dust Li's team or someone from IBM has interests to pick this up.

Regards,
Guangguan Wang
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help