Re: Optimizing kernel compilation / alignments for network performance
From: Rafał Miłecki <zajec5@gmail.com>
Date: 2022-05-06 07:48:55
Also in:
linux-arm-kernel
On 5.05.2022 18:46, Felix Fietkau wrote:
On 05.05.22 18:04, Andrew Lunn wrote:quoted
quoted
you'll see that most used functions are: v7_dma_inv_range __irqentry_text_end l2c210_inv_range v7_dma_clean_range bcma_host_soc_read32 __netif_receive_skb_core arch_cpu_idle l2c210_clean_range fib_table_lookupThere is a lot of cache management functions here. Might sound odd, but have you tried disabling SMP? These cache functions need to operate across all CPUs, and the communication between CPUs can slow them down. If there is only one CPU, these cache functions get simpler and faster. It just depends on your workload. If you have 1 CPU loaded to 100% and the other 3 idle, you might see an improvement. If you actually need more than one CPU, it will probably be worse. I've also found that some Ethernet drivers invalidate or flush too much. If you are sending a 64 byte TCP ACK, all you need to flush is 64 bytes, not the full 1500 MTU. If you receive a TCP ACK, and then recycle the buffer, all you need to invalidate is the size of the ACK, so long as you can guarantee nothing has touched the memory above it. But you need to be careful when implementing tricks like this, or you can get subtle corruption bugs when you get it wrong.I just took a quick look at the driver. It allocates and maps rx buffers that can cover a packet size of BGMAC_RX_MAX_FRAME_SIZE = 9724. This seems rather excessive, especially since most people are going to use a MTU of 1500. My proposal would be to add support for making rx buffer size dependent on MTU, reallocating the ring on MTU changes. This should significantly reduce the time spent on flushing caches.
Oh, that's important too, it was changed by commit 8c7da63978f1 ("bgmac:
configure MTU and add support for frames beyond 8192 byte size"):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c7da63978f1672eb4037bbca6e7eac73f908f03
It lowered NAT speed with bgmac by 60% (362 Mbps → 140 Mbps).
I do all my testing with
#define BGMAC_RX_MAX_FRAME_SIZE 1536