Thread (9 messages) 9 messages, 5 authors, 17d ago

Re: [PATCH net-next v2 2/2] net/smc: reduce TX slot contention with exclusive wait

From: Dust Li <dust.li@linux.alibaba.com>
Date: 2026-06-08 14:04:45
Also in: linux-rdma, linux-s390, lkml

On 2026-05-28 16:48:19, D. Wythe wrote:
quoted hunk ↗ jump to hunk
smc_wr_tx_get_free_slot() waits for a free TX slot with
wait_event_interruptible_timeout(). Since the wait_event family
enqueues waiters as non-exclusive, wake_up() may wake multiple
waiters even though only one can use the slot, causing
thundering-herd contention when slots are scarce.

Use an exclusive wait loop with prepare_to_wait_exclusive() so
wake_up() wakes only one waiter per freed slot.
smc_wr_wakeup_tx_wait() still uses wake_up_all() during link
teardown, so teardown behavior is unchanged.

Performance measured with netperf TCP_RR (63 flows, 200B write /
1000B read, 60s duration):

+-------------------------------+---------------+---------------+
| smcr_max_conns_per_lgr        | 32            | 255           |
|-------------------------------+---------------+---------------|
| before                        | 4.85 Gb/s     | 657.95 Mb/s   |
|-------------------------------+---------------+---------------|
| after                         | 5.01 Gb/s     | 2.2 Gb/s      |
+-------------------------------+---------------+---------------+

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/smc_wr.c | 36 ++++++++++++++++++++++++++----------
1 file changed, 26 insertions(+), 10 deletions(-)
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 130bc6c26fb3..3cb47f77130e 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -153,9 +153,11 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
			    struct smc_rdma_wr **wr_rdma_buf,
			    struct smc_wr_tx_pend_priv **wr_pend_priv)
{
+	unsigned long timeout = SMC_WR_TX_WAIT_FREE_SLOT_TIME;
	struct smc_link_group *lgr = smc_get_lgr(link);
	struct smc_wr_tx_pend *wr_pend;
	u32 idx = link->wr_tx_cnt;
+	DEFINE_WAIT(wait);
	int rc;

	*wr_buf = NULL;
@@ -165,17 +167,31 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
		if (rc)
			return rc;
	} else {
-		rc = wait_event_interruptible_timeout(
-			link->wr_tx_wait,
-			!smc_link_sendable(link) ||
-			lgr->terminating ||
-			(smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY),
-			SMC_WR_TX_WAIT_FREE_SLOT_TIME);
-		if (!rc) {
-			/* timeout - terminate link */
-			smcr_link_down_cond_sched(link);
-			return -EPIPE;
+		rc = 0;
+		for (;;) {
+			prepare_to_wait_exclusive(&link->wr_tx_wait, &wait,
+						  TASK_INTERRUPTIBLE);
+			if (!smc_link_sendable(link) || lgr->terminating ||
+			    smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY)
+				break;
+			timeout = schedule_timeout(timeout);
+			/* re-check */
+			if (!smc_link_sendable(link) || lgr->terminating ||
+			    smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY)
+				break;
+			if (!timeout) {
+				/* timeout - terminate link */
+				smcr_link_down_cond_sched(link);
+				break;
+			}
The change itself looks correct to me. But I think we should probably define
a wait_event_interruptible_exclusive_timeout() helper in include/linux/wait.h
rather than open-coding it in smc ?
+			if (signal_pending(current)) {
+				rc = -ERESTARTSYS;
+				break;
+			}
		}
One more thing, seems we changed the signal here, it's better to add a comment
or note it in the commit message.

Best regards,
Dust
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help