Re: [PATCH net-next] net: axienet: Use NAPI for TX completion path
From: Robert Hancock <hidden>
Date: 2022-05-05 20:16:54
Also in:
linux-arm-kernel
On Thu, 2022-05-05 at 12:56 -0600, Robert Hancock wrote:
On Thu, 2022-05-05 at 11:08 -0700, Jakub Kicinski wrote:quoted
On Thu, 5 May 2022 17:33:39 +0000 Robert Hancock wrote:quoted
On Wed, 2022-05-04 at 19:20 -0700, Jakub Kicinski wrote:quoted
On Mon, 2 May 2022 19:30:51 +0000 Radhey Shyam Pandey wrote:quoted
Thanks for the patch. I assume for simulating heavy network load we are using netperf/iperf. Do we have some details on the benchmark before and after adding TX NAPI? I want to see the impact on throughput.Seems like a reasonable ask, let's get the patch reposted with the numbers in the commit message.Didn't mean to ignore that request, looks like I didn't get Radhey's email directly, odd. I did a test with iperf3 from the board (Xilinx MPSoC ZU9EG platform) connected to a Linux PC via a switch at 1G link speed. With TX NAPI in place I saw about 942 Mbps for TX rate, with the previous code I saw 941 Mbps. RX speed was also unchanged at 941 Mbps. So no real significant change either way. I can spin another version of the patch that includes these numbers.Sounds like line rate, is there a difference in CPU utilization?Some measurements on that from the TX load case - in both cases the RX and TX IRQs ended up being split across CPU0 and CPU3 due to irqbalance: Before: CPU0 (RX): 1% hard IRQ, 13% soft IRQ CPU3 (TX): 12% hard IRQ, 30% soft IRQ After: CPU0 (RX): <1% hard IRQ, 29% soft IRQ CPU3 (TX): <1% hard IRQ, 21% soft IRQ The hard IRQ time is definitely lower, and the total CPU usage is lower as well (56% down to 50%). It's interesting that so much of the CPU load ended up on the CPU with the RX IRQ though, presumably because the RX and TX IRQs are triggering the same NAPI poll operation. Since they're separate IRQs that can be on separate CPUs, it might be a win to use separate NAPI poll structures for RX and TX so that both CPUs aren't trying to hit the same rings (TX and RX)?
Indeed, it appears that separate RX and TX NAPI polling lowers the CPU usage overall by a few percent as well as keeping the TX work on the same CPU as the TX IRQ. I'll submit a v3 with these changes and will include the softirq numbers in the commit text.
-- Robert Hancock Senior Hardware Designer, Calian Advanced Technologies www.calian.com