Re: [PATCH RFC net-next] net/smc: transition to RDMA core CQ pooling

From: "D. Wythe" <alibuda@linux.alibaba.com >
Date: 2026-02-27 09:29:40
Also in: linux-rdma, linux-s390, lkml

On Fri, Feb 27, 2026 at 10:11:38AM +0530, Mahanta Jambigi wrote:


On 24/02/26 7:49 am, D. Wythe wrote:

quoted

On Fri, Feb 13, 2026 at 04:53:28PM +0530, Mahanta Jambigi wrote:

quoted


On 09/02/26 1:23 pm, D. Wythe wrote:

quoted

On Fri, Feb 06, 2026 at 04:58:23PM +0530, Mahanta Jambigi wrote:

quoted


On 02/02/26 3:18 pm, D. Wythe wrote:

quoted

The current SMC-R implementation relies on global per-device CQs
and manual polling within tasklets, which introduces severe
scalability bottlenecks due to global lock contention and tasklet
scheduling overhead, resulting in poor performance as concurrency
increases.

Refactor the completion handling to utilize the ib_cqe API and
standard RDMA core CQ pooling. This transition provides several key
advantages:

1. Multi-CQ: Shift from a single shared per-device CQ to multiple
link-specific CQs via the CQ pool. This allows completion processing
to be parallelized across multiple CPU cores, effectively eliminating
the global CQ bottleneck.

2. Leverage DIM: Utilizing the standard CQ pool with IB_POLL_SOFTIRQ
enables Dynamic Interrupt Moderation from the RDMA core, optimizing
interrupt frequency and reducing CPU load under high pressure.

3. O(1) Context Retrieval: Replaces the expensive wr_id based lookup
logic (e.g., smc_wr_tx_find_pending_index) with direct context retrieval
using container_of() on the embedded ib_cqe.

4. Code Simplification: This refactoring results in a reduction of
~150 lines of code. It removes redundant sequence tracking, complex lookup
helpers, and manual CQ management, significantly improving maintainability.

Performance Test: redis-benchmark with max 32 connections per QP
Data format: Requests Per Second (RPS), Percentage in brackets
represents the gain/loss compared to TCP.

| Clients | TCP      | SMC (original)      | SMC (cq_pool)       |
|---------|----------|---------------------|---------------------|
| c = 1   | 24449    | 31172  (+27%)       | 34039  (+39%)       |
| c = 2   | 46420    | 53216  (+14%)       | 64391  (+38%)       |
| c = 16  | 159673   | 83668  (-48%)  <--  | 216947 (+36%)       |
| c = 32  | 164956   | 97631  (-41%)  <--  | 249376 (+51%)       |
| c = 64  | 166322   | 118192 (-29%)  <--  | 249488 (+50%)       |
| c = 128 | 167700   | 121497 (-27%)  <--  | 249480 (+48%)       |
| c = 256 | 175021   | 146109 (-16%)  <--  | 240384 (+37%)       |
| c = 512 | 168987   | 101479 (-40%)  <--  | 226634 (+34%)       |

The results demonstrate that this optimization effectively resolves the
scalability bottleneck, with RPS increasing by over 110% at c=64
compared to the original implementation.

I applied your patch to the latest kernel(6.19-rc8) & saw below
Performance results:

1) In my evaluation, I ran several *uperf* based workloads using a
request/response (RR) pattern, and I observed performance *degradation*
ranging from *4%* to *59%*, depending on the specific read/write sizes
used. For example, with a TCP RR workload using 50 parallel clients
(nprocs=50) sending a 200‑byte request and reading a 1000‑byte response
over a 60‑second run, I measured approximately 59% degradation compared
to SMC‑R original performance.

The only setting I changed was net.smc.smcr_max_conns_per_lgr = 32, all
other parameters were left at their default values. redis-benchmark is a
classic Request/Response (RR) workload, which contradicts your test
results. Since I'm unable to reproduce your results, it would be
very helpful if you could share the specific test configuration for my
analysis.

I used a simple client–server setup connected via 25 Gb/s RoCE_Express2
adapters on the same LAN(connection established via SMC-R v1). After
running the commands shown below, I observed a performance degradation
of up to 59%.

Server: smc_run uperf -s
Client: smc_run uperf -m rr1c-200x1000-50.xml

cat rr1c-200x1000-50.xml

<?xml version="1.0"?>
<profile name="TCP_RR">
	<group nprocs="50">
		<transaction iterations="1">
			<flowop type="connect" options="remotehost=server_ip protocol=tcp
tcp_nodelay" />
		</transaction>
		<transaction duration="60">
			<flowop type="write" options="size=200"/>
			<flowop type="read" options="size=1000"/>
		</transaction>
		<transaction iterations="1">
			<flowop type="disconnect" />
		</transaction>
	</group>
</profile>

Using the exact same XML profile you provided, I tested this on a 25Gb
NIC. I observed no degradation. Instead, performance improved
significantly:

Original: ~1.08 Gb/s
Patched: ~5.1 Gb/s

I suspect the 59% drop might be due to connections falling back to TCP.
Could you check smcss -a during your test to see if the traffic is
actually running over SMC-R?

I have checked this. The connection was successful using *SMCR* Mode
itself. Also I have confirmed this via 'smcr -d stats' command which
shows 0 count for TCP fallback.

Given that fallback is ruled out, and a 59% drop is quite massive,
especially since I'm seeing a significant improvement on my end.

Since I am unable to reproduce this locally, I would suggest analyzing
the CPU consumption and perf profiles in your environment. With a
regression this severe, the hotspots or differences should be fairly
obvious to identify.

quoted

I installed redis-server on the server machine & redis-benchmark on the
client machine & I was able to establish the SMC-R using below commands.
If you could help me with the exact commands you used to measure the
redis-benchmark performance, I can try the same on my setup.

Server: smc_run redis-server --port <port_num> --save "" --appendonly no
  --protected-mode no --bind 0.0.0.0
Client: smc_run redis-benchmark -h <server_ip> -p <port_num> -n 10000 -c
50 -t ping_inline,ping_bulk -q

Here are the exact commands and scripts I used for the
redis-benchmark:

Server: smc_run redis-server --protected-mode no --save

Client: smc_run redis-benchmark -h <server_ip> -n 5000000 -t set --threads 3
-c <conn_num>

D. Wythe

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help