Thread (18 messages) 18 messages, 3 authors, 2022-02-19

Re: [PATCH net-next 1/2] sfc: default config to 1 channel/core in local NUMA node only

From: Íñigo Huguet <hidden>
Date: 2022-02-10 09:36:11

On Mon, Feb 7, 2022 at 5:53 PM Jakub Kicinski [off-list ref] wrote:
On Mon, 7 Feb 2022 16:03:01 +0100 Íñigo Huguet wrote:
quoted
On Fri, Jan 28, 2022 at 11:27 PM Jakub Kicinski [off-list ref] wrote:
quoted
On Fri, 28 Jan 2022 16:19:21 +0100 Íñigo Huguet wrote:
quoted
Handling channels from CPUs in different NUMA node can penalize
performance, so better configure only one channel per core in the same
NUMA node than the NIC, and not per each core in the system.

Fallback to all other online cores if there are not online CPUs in local
NUMA node.
I think we should make netif_get_num_default_rss_queues() do a similar
thing. Instead of min(8, num_online_cpus()) we should default to
num_cores / 2 (that's physical cores, not threads). From what I've seen
this appears to strike a good balance between wasting resources on
pointless queues per hyperthread, and scaling up for CPUs which have
many wimpy cores.
I have a few busy weeks coming, but I can do this after that.

With num_cores / 2 you divide by 2 because you're assuming 2 NUMA
nodes, or just the plain number 2?
Plain number 2, it's just a heuristic which seems to work okay.
One queue per core (IOW without the /2) is still way too many queues
for normal DC workloads.
Maybe it's because of being quite special workloads, but I have
encountered problems related to queues in different NUMA nodes in 2
cases: XDP performance being almost half with more RX queues because
of being in different node (the example in my patches) and a customer
losing UDP packets which was solved reducing the number of RX queues
so all them are in the same node.

-- 
Íñigo Huguet
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help