Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts

[patch 00/30] genirq: Treewide hunt for irq descriptor abuse and assorted fixes · Thomas Gleixner <hidden> · 2020-12-10
[patch 23/30] net/mlx5: Use effective interrupt affinity · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 23/30] net/mlx5: Use effective interrupt affinity · Tariq Toukan <hidden> · 2020-12-13
Re: [patch 23/30] net/mlx5: Use effective interrupt affinity · Saeed Mahameed <saeed@kernel.org> · 2020-12-14
[patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · boris.ostrovsky@oracle.com · 2020-12-10
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Thomas Gleixner <hidden> · 2020-12-11
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Jürgen Groß <jgross@suse.com> · 2020-12-11
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Thomas Gleixner <hidden> · 2020-12-11
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Jürgen Groß <jgross@suse.com> · 2020-12-11
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Jürgen Groß <jgross@suse.com> · 2020-12-11
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Thomas Gleixner <hidden> · 2020-12-11
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · boris.ostrovsky@oracle.com · 2020-12-11
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Thomas Gleixner <hidden> · 2020-12-11
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Andrew Cooper <hidden> · 2020-12-11
Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts · Thomas Gleixner <hidden> · 2020-12-11
[patch 22/30] net/mlx5: Replace irq_to_desc() abuse · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 22/30] net/mlx5: Replace irq_to_desc() abuse · Tariq Toukan <hidden> · 2020-12-13
Re: [patch 22/30] net/mlx5: Replace irq_to_desc() abuse · Saeed Mahameed <saeed@kernel.org> · 2020-12-14
[patch 28/30] xen/events: Reduce irq_info::spurious_cnt storage size · Thomas Gleixner <hidden> · 2020-12-10
[patch 29/30] xen/events: Implement irq distribution · Thomas Gleixner <hidden> · 2020-12-10
[patch 24/30] xen/events: Remove unused bind_evtchn_to_irq_lateeoi() · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 24/30] xen/events: Remove unused bind_evtchn_to_irq_lateeoi() · boris.ostrovsky@oracle.com · 2020-12-10
Re: [patch 24/30] xen/events: Remove unused bind_evtchn_to_irq_lateeoi() · Thomas Gleixner <hidden> · 2020-12-11
[patch 17/30] NTB/msi: Use irq_has_action() · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 17/30] NTB/msi: Use irq_has_action() · Logan Gunthorpe <logang@deltatee.com> · 2020-12-10
[patch 09/30] ARM: smp: Use irq_desc_kstat_cpu() in show_ipi_list() · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 09/30] ARM: smp: Use irq_desc_kstat_cpu() in show_ipi_list() · Marc Zyngier <maz@kernel.org> · 2020-12-11
[patch 20/30] net/mlx4: Replace irq_to_desc() abuse · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 20/30] net/mlx4: Replace irq_to_desc() abuse · Tariq Toukan <hidden> · 2020-12-13
[patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy · Jani Nikula <jani.nikula@linux.intel.com> · 2020-12-11
Re: [patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy · Tvrtko Ursulin <hidden> · 2020-12-11
Re: [patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy · Thomas Gleixner <hidden> · 2020-12-11
RE: [patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy · David Laight <hidden> · 2020-12-12
RE: [patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy · Thomas Gleixner <hidden> · 2020-12-11
RE: [patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy · David Laight <hidden> · 2020-12-11
[patch 12/30] s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt() · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 12/30] s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt() · Heiko Carstens <hca@linux.ibm.com> · 2020-12-10
[patch 05/30] genirq: Annotate irq stats data races · Thomas Gleixner <hidden> · 2020-12-10
[patch 15/30] pinctrl: nomadik: Use irq_has_action() · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 15/30] pinctrl: nomadik: Use irq_has_action() · Linus Walleij <hidden> · 2020-12-12
[patch 13/30] drm/i915/lpe_audio: Remove pointless irq_to_desc() usage · Thomas Gleixner <hidden> · 2020-12-10
Re: [Intel-gfx] [patch 13/30] drm/i915/lpe_audio: Remove pointless irq_to_desc() usage · Ville Syrjälä <hidden> · 2020-12-10
Re: [Intel-gfx] [patch 13/30] drm/i915/lpe_audio: Remove pointless irq_to_desc() usage · Jani Nikula <jani.nikula@linux.intel.com> · 2020-12-11
[patch 16/30] mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 16/30] mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc · Linus Walleij <hidden> · 2020-12-11
Re: [patch 16/30] mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc · Lee Jones <hidden> · 2020-12-11
Re: [patch 16/30] mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc · Andy Shevchenko <hidden> · 2020-12-11
[patch 19/30] PCI: mobiveil: Use irq_data_get_irq_chip_data() · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 19/30] PCI: mobiveil: Use irq_data_get_irq_chip_data() · Rob Herring <robh@kernel.org> · 2020-12-10
[patch 30/30] genirq: Remove export of irq_to_desc() · Thomas Gleixner <hidden> · 2020-12-10
[patch 11/30] parisc/irq: Use irq_desc_kstat_cpu() in show_interrupts() · Thomas Gleixner <hidden> · 2020-12-10
[patch 10/30] arm64/smp: Use irq_desc_kstat_cpu() in arch_show_interrupts() · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 10/30] arm64/smp: Use irq_desc_kstat_cpu() in arch_show_interrupts() · Marc Zyngier <maz@kernel.org> · 2020-12-11
[patch 04/30] genirq: Provide irq_get_effective_affinity() · Thomas Gleixner <hidden> · 2020-12-10
[patch 08/30] genirq: Provide kstat_irqdesc_cpu() · Thomas Gleixner <hidden> · 2020-12-10
[patch 25/30] xen/events: Remove disfunct affinity spreading · Thomas Gleixner <hidden> · 2020-12-10
[patch 26/30] xen/events: Use immediate affinity setting · Thomas Gleixner <hidden> · 2020-12-10
[patch 18/30] PCI: xilinx-nwl: Use irq_data_get_irq_chip_data() · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 18/30] PCI: xilinx-nwl: Use irq_data_get_irq_chip_data() · Rob Herring <robh@kernel.org> · 2020-12-10
[patch 21/30] net/mlx4: Use effective interrupt affinity · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 21/30] net/mlx4: Use effective interrupt affinity · Tariq Toukan <hidden> · 2020-12-13
[patch 07/30] genirq: Make kstat_irqs() static · Thomas Gleixner <hidden> · 2020-12-10
[patch 02/30] genirq: Move status flag checks to core · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 02/30] genirq: Move status flag checks to core · Guenter Roeck <linux@roeck-us.net> · 2020-12-27
Re: [patch 02/30] genirq: Move status flag checks to core · Thomas Gleixner <hidden> · 2021-01-11
[patch 06/30] parisc/irq: Simplify irq count output for /proc/interrupts · Thomas Gleixner <hidden> · 2020-12-10
[patch 03/30] genirq: Move irq_set_lockdep_class() to core · Thomas Gleixner <hidden> · 2020-12-10
Re: [patch 03/30] genirq: Move irq_set_lockdep_class() to core · Andy Shevchenko <hidden> · 2020-12-11
Re: [patch 03/30] genirq: Move irq_set_lockdep_class() to core · Thomas Gleixner <hidden> · 2020-12-11
Re: [patch 03/30] genirq: Move irq_set_lockdep_class() to core · Thomas Gleixner <hidden> · 2020-12-11
Re: [patch 03/30] genirq: Move irq_set_lockdep_class() to core · Andy Shevchenko <hidden> · 2020-12-12
[patch 01/30] genirq: Move irq_has_action() into core code · Thomas Gleixner <hidden> · 2020-12-10

From: Andrew Cooper <hidden>
Date: 2020-12-11 23:15:11
Also in: dri-devel, intel-gfx, linux-gpio, linux-pci, linux-rdma, lkml, xen-devel

On 11/12/2020 21:27, Thomas Gleixner wrote:

On Fri, Dec 11 2020 at 09:29, boris ostrovsky wrote:

quoted

On 12/11/20 7:37 AM, Thomas Gleixner wrote:

quoted

On Fri, Dec 11 2020 at 13:10, Jürgen Groß wrote:

quoted

On 11.12.20 00:20, boris.ostrovsky@oracle.com wrote:

quoted

On 12/10/20 2:26 PM, Thomas Gleixner wrote:

quoted

Change the implementation so that the channel is bound to CPU0 at the XEN
level and leave the affinity mask alone. At startup of the interrupt
affinity will be assigned out of the affinity mask and the XEN binding will
be updated.

If that's the case then I wonder whether we need this call at all and instead bind at startup time.

After some discussion with Thomas on IRC and xen-devel archaeology the
result is: this will be needed especially for systems running on a
single vcpu (e.g. small guests), as the .irq_set_affinity() callback
won't be called in this case when starting the irq.

On UP are we not then going to end up with an empty affinity mask? Or
are we guaranteed to have it set to 1 by interrupt generic code?

An UP kernel does not ever look on the affinity mask. The
chip::irq_set_affinity() callback is not invoked so the mask is
irrelevant.

A SMP kernel on a UP machine sets CPU0 in the mask so all is good.

quoted

This is actually why I brought this up in the first place --- a
potential mismatch between the affinity mask and Xen-specific data
(e.g. info->cpu and then protocol-specific data in event channel
code). Even if they are re-synchronized later, at startup time (for
SMP).

Which is not a problem either. The affinity mask is only relevant for
setting the affinity, but it's not relevant for delivery and never can
be.

quoted

I don't see anything that would cause a problem right now but I worry
that this inconsistency may come up at some point.

As long as the affinity mask becomes not part of the event channel magic
this should never matter.

Look at it from hardware:

interrupt is affine to CPU0

     CPU0 runs:
     
     set_affinity(CPU0 -> CPU1)
        local_irq_disable()
        
 --> interrupt is raised in hardware and pending on CPU0

        irq hardware is reconfigured to be affine to CPU1

        local_irq_enable()

 --> interrupt is handled on CPU0

the next interrupt will be raised on CPU1

So info->cpu which is registered via the hypercall binds the 'hardware
delivery' and whenever the new affinity is written it is rebound to some
other CPU and the next interrupt is then raised on this other CPU.

It's not any different from the hardware example at least not as far as
I understood the code.

Xen's event channels do have a couple of quirks.

Binding an event channel always results in one spurious event being
delivered.  This is to cover notifications which can get lost during the
bidirectional setup, or re-setups in certain configurations.

Binding an interdomain or pirq event channel always defaults to vCPU0. 
There is no way to atomically set the affinity while binding.  I believe
the API predates SMP guest support in Xen, and noone has fixed it up since.

As a consequence, the guest will observe the event raised on vCPU0 as
part of setting up the event, even if it attempts to set a different
affinity immediately afterwards.  A little bit of care needs to be taken
when binding an event channel on vCPUs other than 0, to ensure that the
callback is safe with respect to any remaining state needing initialisation.

Beyond this, there is nothing magic I'm aware of.

We have seen soft lockups before in certain scenarios, simply due to the
quantity of events hitting vCPU0 before irqbalance gets around to
spreading the load.  This is why there is an attempt to round-robin the
userspace event channel affinities by default, but I still don't see why
this would need custom affinity logic itself.

Thanks,

~Andrew

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help