Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting managed interrupts

From: Jason Wang <jasowang@redhat.com>
Date: 2024-11-01 03:35:00
Also in: linux-block, linux-doc, linux-fsdevel, lkml

Possibly related (same subject, not in this thread)

2024-11-01 · Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting managed interrupts · Thomas Gleixner <hidden>
2024-11-01 · Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting managed interrupts · Jiri Slaby <jirislaby@kernel.org>
2024-11-01 · Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting managed interrupts · mapicccy <hidden>
2024-10-31 · Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting managed interrupts · Ming Lei <hidden>
2024-10-31 · Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting managed interrupts · Thomas Gleixner <hidden>

On Fri, Nov 1, 2024 at 11:12 AM mapicccy [off-list ref] wrote:



2024年10月31日 18:50，Ming Lei [off-list ref] 写道：

On Thu, Oct 31, 2024 at 6:35 PM Thomas Gleixner [off-list ref] wrote:


On Thu, Oct 31 2024 at 15:46, guanjun@linux.alibaba.com wrote:

#ifdef CONFIG_SMP

+static unsigned int __read_mostly managed_irqs_per_node;
+static struct cpumask managed_irqs_cpumsk[MAX_NUMNODES] __cacheline_aligned_in_smp = {
+     [0 ... MAX_NUMNODES-1] = {CPU_BITS_ALL}
+};

+static void __group_prepare_affinity(struct cpumask *premask,
+                                  cpumask_var_t *node_to_cpumask)
+{
+     nodemask_t nodemsk = NODE_MASK_NONE;
+     unsigned int ncpus, n;
+
+     get_nodes_in_cpumask(node_to_cpumask, premask, &nodemsk);
+
+     for_each_node_mask(n, nodemsk) {
+             cpumask_and(&managed_irqs_cpumsk[n], &managed_irqs_cpumsk[n], premask);
+             cpumask_and(&managed_irqs_cpumsk[n], &managed_irqs_cpumsk[n], node_to_cpumask[n]);


How is this managed_irqs_cpumsk array protected against concurrency?

+             ncpus = cpumask_weight(&managed_irqs_cpumsk[n]);
+             if (ncpus < managed_irqs_per_node) {
+                     /* Reset node n to current node cpumask */
+                     cpumask_copy(&managed_irqs_cpumsk[n], node_to_cpumask[n]);


This whole logic is incomprehensible and aside of the concurrency
problem it's broken when CPUs are made present at run-time because these
cpu masks are static and represent the stale state of the last
invocation.

Given the limitations of the x86 vector space, which is not going away
anytime soon, there are only two options IMO to handle such a scenario.

  1) Tell the nvme/block layer to disable queue affinity management


+1

There are other use cases, such as cpu isolation, which can benefit from
this way too.

https://lore.kernel.org/linux-nvme/20240702104112.4123810-1-ming.lei@redhat.com/ (local)

I wonder if we need to do the same for virtio-blk.

Thanks for your reminder. However, in this link only modified the NVMe driver,
but there is the same issue in the virtio net driver as well.

I guess you meant virtio-blk actually?

Guanjun


Thanks,

Thanks

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help