Re: Optimizing kernel compilation / alignments for network performance
From: Rafał Miłecki <zajec5@gmail.com>
Date: 2022-05-10 12:58:30
Also in:
linux-arm-kernel
On 6.05.2022 11:44, Arnd Bergmann wrote:
On Fri, May 6, 2022 at 10:55 AM Rafał Miłecki [off-list ref] wrote:quoted
On 6.05.2022 10:45, Arnd Bergmann wrote:quoted
On Fri, May 6, 2022 at 9:44 AM Rafał Miłecki [off-list ref] wrote:quoted
With echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus my NAT speeds were jumping between 2 speeds: 284 Mbps / 408 MbpsCan you try using 'numactl -C' to pin the iperf processes to a particular CPU core? This may be related to the locality of the user process relative to where the interrupts end up.I run iperf on x86 machines connected to router's WAN and LAN ports. It's meant to emulate end user just downloading from / uploading to Internet some data. Router's only task is doing masquarade NAT here.Ah, makes sense. Can you observe the CPU usage to be on a particular core in the slow vs fast case then?
With echo 0 > /sys/class/net/eth0/queues/rx-0/rps_cpus NAT speed was verying between: a) 311 Mb/s (CPUs load: 100% + 0%) b) 408 Mb/s (CPUs load: 100% + 62%) With echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus NAT speed was verying between: a) 290 Mb/s (CPUs load: 100% + 0%) b) 410 Mb/s (CPUs load: 100% + 63%) With echo 2 > /sys/class/net/eth0/queues/rx-0/rps_cpus NAT speed was stable: a) 372 Mb/s (CPUs load: 100% + 26%) b) 375 Mb/s (CPUs load: 82% + 100%) With echo 3 > /sys/class/net/eth0/queues/rx-0/rps_cpus NAT speed was verying between: a) 293 Mb/s (CPUs load: 100% + 0%) b) 332 Mb/s (CPUs load: 100% + 17%) c) 374 Mb/s (CPUs load: 81% + 100%) d) 442 Mb/s (CPUs load: 100% + 75%) After some extra debugging I found a reason for varying CPU usage & varying NAT speeds. My router has a single swtich so I use two VLANs: eth0.1 - LAN eth0.2 - WAN (VLAN traffic is routed to correct ports by switch). On top of that I have "br-lan" bridge interface briding eth0.1 and wireless interfaces. For all that time I had /sys/class/net/br-lan/queues/rx-0/rps_cpus set to 3. So bridge traffic was randomly handled by CPU 0 or CPU 1. So if I assign specific CPU core to each of two interfaces, e.g.: echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus echo 2 > /sys/class/net/br-lan/queues/rx-0/rps_cpus things get stable. With above I get stable 419 Mb/s (CPUs load: 100% + 64%) on every iperf session.