Re: [PATCH net-next 1/2] sfc: default config to 1 channel/core in local NUMA node only
From: Íñigo Huguet <hidden>
Date: 2022-02-10 09:36:11
On Mon, Feb 7, 2022 at 5:53 PM Jakub Kicinski [off-list ref] wrote:
On Mon, 7 Feb 2022 16:03:01 +0100 Íñigo Huguet wrote:quoted
On Fri, Jan 28, 2022 at 11:27 PM Jakub Kicinski [off-list ref] wrote:quoted
On Fri, 28 Jan 2022 16:19:21 +0100 Íñigo Huguet wrote:quoted
Handling channels from CPUs in different NUMA node can penalize performance, so better configure only one channel per core in the same NUMA node than the NIC, and not per each core in the system. Fallback to all other online cores if there are not online CPUs in local NUMA node.I think we should make netif_get_num_default_rss_queues() do a similar thing. Instead of min(8, num_online_cpus()) we should default to num_cores / 2 (that's physical cores, not threads). From what I've seen this appears to strike a good balance between wasting resources on pointless queues per hyperthread, and scaling up for CPUs which have many wimpy cores.I have a few busy weeks coming, but I can do this after that. With num_cores / 2 you divide by 2 because you're assuming 2 NUMA nodes, or just the plain number 2?Plain number 2, it's just a heuristic which seems to work okay. One queue per core (IOW without the /2) is still way too many queues for normal DC workloads.
Maybe it's because of being quite special workloads, but I have encountered problems related to queues in different NUMA nodes in 2 cases: XDP performance being almost half with more RX queues because of being in different node (the example in my patches) and a customer losing UDP packets which was solved reducing the number of RX queues so all them are in the same node. -- Íñigo Huguet