Thread (44 messages) 44 messages, 9 authors, 2021-05-03

Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs

From: Nitesh Narayan Lal <hidden>
Date: 2021-04-08 18:49:45
Also in: linux-pci, lkml, netdev

On 4/7/21 11:18 AM, Nitesh Narayan Lal wrote:
On 4/6/21 1:22 PM, Jesse Brandeburg wrote:
quoted
Continuing a thread from a bit ago...

Nitesh Narayan Lal wrote:
quoted
quoted
After a little more digging, I found out why cpumask_local_spread change
affects the general/initial smp_affinity for certain device IRQs.

After the introduction of the commit:

    e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
Continuing the conversation about the above commit and adding Jesse.
I was trying to understand the problem that the commit message explains
"The default behavior of the kernel is somewhat undesirable as all
requested interrupts end up on CPU0 after registration.", I have also been
trying to reproduce this behavior without the patch but I failed in doing
so, maybe because I am missing something here.

@Jesse Can you please explain? FWIU IRQ affinity should be decided based on
the default affinity mask.
Thanks, Jesse for responding.
quoted
The original issue as seen, was that if you rmmod/insmod a driver
*without* irqbalance running, the default irq mask is -1, which means
any CPU. The older kernels (this issue was patched in 2014) used to use
that affinity mask, but the value programmed into all the interrupt
registers "actual affinity" would end up delivering all interrupts to
CPU0,
So does that mean the affinity mask for the IRQs was different wrt where
the IRQs were actually delivered?
Or, the affinity mask itself for the IRQs after rmmod, insmod was changed
to 0 instead of -1?

I did a quick test on top of 5.12.0-rc6 by comparing the i40e IRQ affinity
mask before removing the kernel module and after doing rmmod+insmod
and didn't find any difference.
quoted
 and if the machine was under traffic load incoming when the
driver loaded, CPU0 would start to poll among all the different netdev
queues, all on CPU0.

The above then leads to the condition that the device is stuck polling
even if the affinity gets updated from user space, and the polling will
continue until traffic stops.
quoted
The problem with the commit is that when we overwrite the affinity mask
based on the hinting mask we completely ignore the default SMP affinity
mask. If we do want to overwrite the affinity based on the hint mask we
should atleast consider the default SMP affinity.
For the issue where the IRQs don't follow the default_smp_affinity mask
because of this patch, the following are the steps by which it can be easily
reproduced with the latest linux kernel:

# Kernel
5.12.0-rc6+

# Other pramaeters in the cmdline
isolcpus=2-39,44-79 nohz=on nohz_full=2-39,44-79
rcu_nocbs=2-39,44-79

# cat /proc/irq/default_smp_affinity
0000,00000f00,00000003 [Corresponds to HK CPUs - 0, 1, 40, 41, 42 and 43]

# Create VFs and check IRQ affinity mask

/proc/irq/1423/iavf-ens1f1v3-TxRx-3
3
/proc/irq/1424/iavf-0000:3b:0b.0:mbx
0
40
42
/proc/irq/1425/iavf-ens1f1v8-TxRx-0
0
/proc/irq/1426/iavf-ens1f1v8-TxRx-1
1
/proc/irq/1427/iavf-ens1f1v8-TxRx-2
2
/proc/irq/1428/iavf-ens1f1v8-TxRx-3
3
...
/proc/irq/1475/iavf-ens1f1v15-TxRx-0
0
/proc/irq/1476/iavf-ens1f1v15-TxRx-1
1
/proc/irq/1477/iavf-ens1f1v15-TxRx-2
2
/proc/irq/1478/iavf-ens1f1v15-TxRx-3
3
/proc/irq/1479/iavf-0000:3b:0a.0:mbx
0
40
42
...
/proc/irq/240/iavf-ens1f1v3-TxRx-0
0
/proc/irq/248/iavf-ens1f1v3-TxRx-1
1
/proc/irq/249/iavf-ens1f1v3-TxRx-2
2


Trace dump:
----------
..
11551082:  NetworkManager-1734  [040]  8167.465719: vector_activate:    
            irq=1478 is_managed=0 can_reserve=1 reserve=0
11551090:  NetworkManager-1734  [040]  8167.465720: vector_alloc:
            irq=1478 vector=65 reserved=1 ret=0
11551093:  NetworkManager-1734  [040]  8167.465721: vector_update:      
            irq=1478 vector=65 cpu=42 prev_vector=0 prev_cpu=0
11551097:  NetworkManager-1734  [040]  8167.465721: vector_config:      
            irq=1478 vector=65 cpu=42 apicdest=0x00000200
11551357:  NetworkManager-1734  [040]  8167.465768: vector_alloc:        
            irq=1478 vector=46 reserved=0 ret=0

11551360:  NetworkManager-1734  [040]  8167.465769: vector_update:      
            irq=1478 vector=46 cpu=3 prev_vector=65 prev_cpu=42

11551364:  NetworkManager-1734  [040]  8167.465770: vector_config:      
            irq=1478 vector=46 cpu=3 apicdest=0x00040100
..

As we can see in the above trace the initial affinity for the IRQ 1478 was
correctly set as per the default_smp_affinity mask which includes CPU 42,
however, later on, it is updated with CPU3 which is returned from
cpumask_local_spread().
quoted
Maybe the right thing is to fix which CPUs are passed in as the valid
mask, or make sure the kernel cross checks that what the driver asks
for is a "valid CPU"?
Sure, if we can still reproduce the problem that your patch was fixing then
maybe we can consider adding a new API like cpumask_local_spread_irq in
which we should consider deafult_smp_affinity mask as well before returning
the CPU.
Didn't realize that netdev ml was not included, so adding that.

-- 
Nitesh
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help