Thread (36 messages) 36 messages, 6 authors, 2024-12-06

Re: [RFC/RFT v2 0/3] Introduce GRO support to cpumap codebase

From: Lorenzo Bianconi <hidden>
Date: 2024-11-26 17:03:05
Also in: bpf

From: Daniel Xu <redacted>
Date: Mon, 25 Nov 2024 16:56:49 -0600
quoted

On Mon, Nov 25, 2024, at 9:12 AM, Alexander Lobakin wrote:
quoted
From: Daniel Xu <redacted>
Date: Fri, 22 Nov 2024 17:10:06 -0700
quoted
Hi Olek,

Here are the results.

On Wed, Nov 13, 2024 at 03:39:13PM GMT, Daniel Xu wrote:
quoted

On Tue, Nov 12, 2024, at 9:43 AM, Alexander Lobakin wrote:
[...]
quoted
Baseline (again)

	Transactions	Latency P50 (s)	Latency P90 (s)	Latency P99 (s)			Throughput (Mbit/s)
Run 1	3169917	        0.00007295	0.00007871	0.00009343		Run 1	21749.43
Run 2	3228290	        0.00007103	0.00007679	0.00009215		Run 2	21897.17
Run 3	3226746	        0.00007231	0.00007871	0.00009087		Run 3	21906.82
Run 4	3191258	        0.00007231	0.00007743	0.00009087		Run 4	21155.15
Run 5	3235653	        0.00007231	0.00007743	0.00008703		Run 5	21397.06
Average	3210372.8	0.000072182	0.000077814	0.00009087		Average	21621.126

cpumap v2 Olek

	Transactions	Latency P50 (s)	Latency P90 (s)	Latency P99 (s)			Throughput (Mbit/s)
Run 1	3253651	        0.00007167	0.00007807	0.00009343		Run 1	13497.57
Run 2	3221492	        0.00007231	0.00007743	0.00009087		Run 2	12115.53
Run 3	3296453	        0.00007039	0.00007807	0.00009087		Run 3	12323.38
Run 4	3254460	        0.00007167	0.00007807	0.00009087		Run 4	12901.88
Run 5	3173327	        0.00007295	0.00007871	0.00009215		Run 5	12593.22
Average	3239876.6	0.000071798	0.00007807	0.000091638		Average	12686.316
Delta	0.92%	        -0.53%	        0.33%	        0.85%			        -41.32%


It's very interesting that we see -40% tput w/ the patches. I went back
Oh no, I messed up something =\

Could you please also test not the whole series, but patches 1-3 (up to
"bpf:cpumap: switch to GRO...") and 1-4 (up to "bpf: cpumap: reuse skb
array...")? Would be great to see whether this implementation works
worse right from the start or I just broke something later on.
Patches 1-3 reproduces the -40% tput numbers. 
Ok, thanks! Seems like using the hybrid approach (GRO, but on top of
cpumap's kthreads instead of NAPI) really performs worse than switching
cpumap to NAPI.
quoted
With patches 1-4 the numbers get slightly worse (~1gbps lower) but it was noisy.
Interesting, I was sure patch 4 optimizes stuff... Maybe I'll give up on it.
quoted
tcp_rr results were unaffected.
@ Jakub,

Looks like I can't just use GRO without Lorenzo's conversion to NAPI, at
least for now =\ I took a look on the backlog NAPI and it could be used,
although we'd need a pointer in the backlog to the corresponding cpumap
+ also some synchronization point to make sure backlog NAPI won't access
already destroyed cpumap.

Maybe Lorenzo could take a look...
it seems to me the only difference would be we will use the shared backlog_napi
kthreads instead of having a dedicated kthread for each cpumap entry but we still
need the napi poll logic. I can look into it if you prefer the shared kthread
approach.
@Jakub: what do you think?

Regards,
Lorenzo
Thanks,
Olek

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help