Re: [PATCH net-next v2 2/2] net/smc: reduce TX slot contention with exclusive wait
From: Dust Li <dust.li@linux.alibaba.com>
Date: 2026-06-08 14:04:45
Also in:
linux-rdma, linux-s390, lkml
On 2026-05-28 16:48:19, D. Wythe wrote:
quoted hunk ↗ jump to hunk
smc_wr_tx_get_free_slot() waits for a free TX slot with wait_event_interruptible_timeout(). Since the wait_event family enqueues waiters as non-exclusive, wake_up() may wake multiple waiters even though only one can use the slot, causing thundering-herd contention when slots are scarce. Use an exclusive wait loop with prepare_to_wait_exclusive() so wake_up() wakes only one waiter per freed slot. smc_wr_wakeup_tx_wait() still uses wake_up_all() during link teardown, so teardown behavior is unchanged. Performance measured with netperf TCP_RR (63 flows, 200B write / 1000B read, 60s duration): +-------------------------------+---------------+---------------+ | smcr_max_conns_per_lgr | 32 | 255 | |-------------------------------+---------------+---------------| | before | 4.85 Gb/s | 657.95 Mb/s | |-------------------------------+---------------+---------------| | after | 5.01 Gb/s | 2.2 Gb/s | +-------------------------------+---------------+---------------+ Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> --- net/smc/smc_wr.c | 36 ++++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-)diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index 130bc6c26fb3..3cb47f77130e 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c@@ -153,9 +153,11 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,struct smc_rdma_wr **wr_rdma_buf, struct smc_wr_tx_pend_priv **wr_pend_priv) { + unsigned long timeout = SMC_WR_TX_WAIT_FREE_SLOT_TIME; struct smc_link_group *lgr = smc_get_lgr(link); struct smc_wr_tx_pend *wr_pend; u32 idx = link->wr_tx_cnt; + DEFINE_WAIT(wait); int rc; *wr_buf = NULL;@@ -165,17 +167,31 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,if (rc) return rc; } else { - rc = wait_event_interruptible_timeout( - link->wr_tx_wait, - !smc_link_sendable(link) || - lgr->terminating || - (smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY), - SMC_WR_TX_WAIT_FREE_SLOT_TIME); - if (!rc) { - /* timeout - terminate link */ - smcr_link_down_cond_sched(link); - return -EPIPE; + rc = 0; + for (;;) { + prepare_to_wait_exclusive(&link->wr_tx_wait, &wait, + TASK_INTERRUPTIBLE); + if (!smc_link_sendable(link) || lgr->terminating || + smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY) + break; + timeout = schedule_timeout(timeout); + /* re-check */ + if (!smc_link_sendable(link) || lgr->terminating || + smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY) + break; + if (!timeout) { + /* timeout - terminate link */ + smcr_link_down_cond_sched(link); + break; + }
The change itself looks correct to me. But I think we should probably define a wait_event_interruptible_exclusive_timeout() helper in include/linux/wait.h rather than open-coding it in smc ?
+ if (signal_pending(current)) {
+ rc = -ERESTARTSYS;
+ break;
+ }
}One more thing, seems we changed the signal here, it's better to add a comment or note it in the commit message. Best regards, Dust