Thread (7 messages) 7 messages, 3 authors, 2022-05-05

Re: [PATCH net-next] net: axienet: Use NAPI for TX completion path

From: Robert Hancock <hidden>
Date: 2022-05-05 20:16:54
Also in: linux-arm-kernel

On Thu, 2022-05-05 at 12:56 -0600, Robert Hancock wrote:
On Thu, 2022-05-05 at 11:08 -0700, Jakub Kicinski wrote:
quoted
On Thu, 5 May 2022 17:33:39 +0000 Robert Hancock wrote:
quoted
On Wed, 2022-05-04 at 19:20 -0700, Jakub Kicinski wrote:
quoted
On Mon, 2 May 2022 19:30:51 +0000 Radhey Shyam Pandey wrote:  
quoted
Thanks for the patch. I assume for simulating heavy network load we
are using netperf/iperf. Do we have some details on the benchmark
before and after adding TX NAPI? I want to see the impact on
throughput.  
Seems like a reasonable ask, let's get the patch reposted 
with the numbers in the commit message.  
Didn't mean to ignore that request, looks like I didn't get Radhey's
email
directly, odd.

I did a test with iperf3 from the board (Xilinx MPSoC ZU9EG platform)
connected
to a Linux PC via a switch at 1G link speed. With TX NAPI in place I saw
about
942 Mbps for TX rate, with the previous code I saw 941 Mbps. RX speed was
also
unchanged at 941 Mbps. So no real significant change either way. I can
spin
another version of the patch that includes these numbers.
Sounds like line rate, is there a difference in CPU utilization?
Some measurements on that from the TX load case - in both cases the RX and TX
IRQs ended up being split across CPU0 and CPU3 due to irqbalance:

Before:

CPU0 (RX): 1% hard IRQ, 13% soft IRQ
CPU3 (TX): 12% hard IRQ, 30% soft IRQ

After:

CPU0 (RX): <1% hard IRQ, 29% soft IRQ
CPU3 (TX): <1% hard IRQ, 21% soft IRQ

The hard IRQ time is definitely lower, and the total CPU usage is lower as
well
(56% down to 50%). It's interesting that so much of the CPU load ended up on
the CPU with the RX IRQ though, presumably because the RX and TX IRQs are
triggering the same NAPI poll operation. Since they're separate IRQs that can
be on separate CPUs, it might be a win to use separate NAPI poll structures
for RX and TX so that both CPUs aren't trying to hit the same rings (TX and
RX)?
Indeed, it appears that separate RX and TX NAPI polling lowers the CPU usage
overall by a few percent as well as keeping the TX work on the same CPU as the
TX IRQ. I'll submit a v3 with these changes and will include the softirq
numbers in the commit text.
-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help