Thread (21 messages) 21 messages, 5 authors, 2025-09-28

Re: [PATCH net-next v3 1/2] net/smc: make wr buffer count configurable

From: Dust Li <dust.li@linux.alibaba.com>
Date: 2025-09-28 02:12:59
Also in: linux-doc, linux-rdma, linux-s390, lkml

On 2025-09-28 10:02:43, Dust Li wrote:
On 2025-09-28 00:55:15, Halil Pasic wrote:
quoted
On Thu, 25 Sep 2025 13:25:40 +0200
Halil Pasic [off-list ref] wrote:
quoted
quoted
[...]  
quoted
@@ -683,6 +678,8 @@ int smc_ib_create_queue_pair(struct smc_link *lnk)
 	};
 	int rc;
 
+	qp_attr.cap.max_send_wr = 3 * lnk->lgr->max_send_wr;
+	qp_attr.cap.max_recv_wr = lnk->lgr->max_recv_wr;    
Possibly:

	cap = max(3 * lnk->lgr->max_send_wr, lnk->lgr->max_recv_wr);
	qp_attr.cap.max_send_wr = cap;
	qp_attr.cap.max_recv_wr = cap

to avoid assumption on `max_send_wr`, `max_recv_wr` relative values.  
Can you explain a little more. I'm happy to do the change, but I would
prefer to understand why is keeping qp_attr.cap.max_send_wr ==
qp_attr.cap.max_recv_wr better? But if you tell: "Just trust me!" I will.
Due to a little accident we ended up having a private conversation
on this, which I'm going to sum up quickly.

Paolo stated that he has no strong preference and that I should at
least add a comment, which I will do for v4. 

Unfortunately I don't quite understand why qp_attr.cap.max_send_wr is 3
times the number of send WR buffers we allocate. My understanding
is that qp_attr.cap.max_send_wr is about the number of send WQEs.
We have at most 2 RDMA Write for 1 RDMA send. So 3 times is necessary.
That is explained in the original comments. Maybe it's better to keep it.
.cap = {
               /* include unsolicited rdma_writes as well,
                * there are max. 2 RDMA_WRITE per 1 WR_SEND
                */
       .max_send_wr = SMC_WR_BUF_CNT * 3,
       .max_recv_wr = SMC_WR_BUF_CNT * 3,
       .max_send_sge = SMC_IB_MAX_SEND_SGE,
       .max_recv_sge = lnk->wr_rx_sge_cnt,
       .max_inline_data = 0,
},
quoted
I assume that qp_attr.cap.max_send_wr == qp_attr.cap.max_recv_wr
is not something we would want to preserve.
IIUC, RDMA Write won't consume any RX wqe on the receive side, so I think
the .max_recv_wr can be SMC_WR_BUF_CNT if we don't use RDMA_WRITE_IMM.
I kept thinking about this a bit more, and I realized that max_recv_wr
should be larger than SMC_WR_BUF_CNT.

Since receive WQEs are posted in a softirq context, their posting may be
delayed. Meanwhile, the sender might already have received the TX
completion (CQE) and continue sending new messages. In this case, if the
receiver’s post_recv() (i.e., posting of RX WQEs) is delayed, an RNR
(Receiver Not Ready) can easily occur.

Best regards,
Dust
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help