Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING | linux-hyperv

[patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 02/38] x86/init: Remove unused init ops · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 07/38] iommu/irq_remapping: Consolidate irq domain lookup · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq() · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq() · Bjorn Helgaas <helgaas@kernel.org> · 2020-08-25
Re: [patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq() · Thomas Gleixner <hidden> · 2020-08-25
[patch RFC 14/38] x86/msi: Consolidate MSI allocation · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 16/38] x86/irq: Move apic_post_init() invocation to one place · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 24/38] x86/xen: Consolidate XEN-MSI init · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 24/38] x86/xen: Consolidate XEN-MSI init · Jürgen Groß <jgross@suse.com> · 2020-08-24
Re: [patch RFC 24/38] x86/xen: Consolidate XEN-MSI init · Thomas Gleixner <hidden> · 2020-08-24
Re: [patch RFC 24/38] x86/xen: Consolidate XEN-MSI init · Jürgen Groß <jgross@suse.com> · 2020-08-25
Re: [patch RFC 24/38] x86/xen: Consolidate XEN-MSI init · Thomas Gleixner <hidden> · 2020-08-25
[patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING · Jason Gunthorpe <jgg@nvidia.com> · 2020-08-21
Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING · Jason Gunthorpe <jgg@nvidia.com> · 2020-08-21
Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING · Jason Gunthorpe <jgg@nvidia.com> · 2020-08-22
Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING · Thomas Gleixner <hidden> · 2020-08-22
Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING · Jason Gunthorpe <jgg@nvidia.com> · 2020-08-22
Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING · Thomas Gleixner <hidden> · 2020-08-23
[patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain · Jürgen Groß <jgross@suse.com> · 2020-08-24
Re: [patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain · Thomas Gleixner <hidden> · 2020-08-25
[patch RFC 25/38] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI · Bjorn Helgaas <helgaas@kernel.org> · 2020-08-25
[patch RFC 36/38] platform-msi: Add device MSI infrastructure · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 37/38] irqdomain/msi: Provide msi_alloc/free_store() callbacks · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 15/38] x86/msi: Use generic MSI domain ops · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 35/38] platform-msi: Provide default irq_chip::ack · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI · Bjorn Helgaas <helgaas@kernel.org> · 2020-08-25
Re: [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI · Thomas Gleixner <hidden> · 2020-08-25
Re: [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI · Bjorn Helgaas <helgaas@kernel.org> · 2020-08-25
[patch RFC 32/38] x86/irq: Make most MSI ops XEN private · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 31/38] x86/irq: Cleanup the arch_*_msi_irqs() leftovers · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 33/38] x86/irq: Add DEV_MSI allocation type · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks · Bjorn Helgaas <helgaas@kernel.org> · 2020-08-25
Re: [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks · Thomas Gleixner <hidden> · 2020-08-25
Re: [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks · Bjorn Helgaas <helgaas@kernel.org> · 2020-08-25
Re: [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks · Thomas Gleixner <hidden> · 2020-08-25
Re: [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks · Thomas Gleixner <hidden> · 2020-08-25
[patch RFC 29/38] x86/pci: Set default irq domain in pcibios_add_device() · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 27/38] iommm/vt-d: Store irq domain in struct device · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 28/38] iommm/amd: Store irq domain in struct device · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 23/38] x86/xen: Rework MSI teardown · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 23/38] x86/xen: Rework MSI teardown · Jürgen Groß <jgross@suse.com> · 2020-08-24
[patch RFC 21/38] PCI: MSI: Provide pci_dev_has_special_msi_domain() helper · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 21/38] PCI: MSI: Provide pci_dev_has_special_msi_domain() helper · Bjorn Helgaas <helgaas@kernel.org> · 2020-08-25
[patch RFC 22/38] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 22/38] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() · Jürgen Groß <jgross@suse.com> · 2020-08-24
[patch RFC 19/38] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 18/38] x86/irq: Initialize PCI/MSI domain at PCI init time · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code · Bjorn Helgaas <helgaas@kernel.org> · 2020-08-25
[patch RFC 12/38] x86/irq: Consolidate UV domain allocation · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 11/38] x86/irq: Consolidate DMAR irq allocation · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 01/38] iommu/amd: Prevent NULL pointer dereference · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 10/38] x86/ioapic: Consolidate IOAPIC allocation · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 10/38] x86/ioapic: Consolidate IOAPIC allocation · Boqun Feng <hidden> · 2020-08-26
Re: [patch RFC 10/38] x86/ioapic: Consolidate IOAPIC allocation · Thomas Gleixner <hidden> · 2020-08-26
[patch RFC 09/38] x86/msi: Consolidate HPET allocation · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 05/38] iommu/vt-d: Consolidate irq domain getter · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 08/38] x86/irq: Prepare consolidation of irq_alloc_info · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 06/38] iommu/amd: Consolidate irq domain getter · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 03/38] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency · Thomas Gleixner <hidden> · 2020-08-21
[patch RFC 04/38] x86/irq: Add allocation type for parent domain retrieval · Thomas Gleixner <hidden> · 2020-08-21
Re: [patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI · Jürgen Groß <jgross@suse.com> · 2020-08-22

Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2020-08-21 20:17:26
Also in: linux-iommu, linux-pci, lkml, xen-devel

On Fri, Aug 21, 2020 at 09:47:43PM +0200, Thomas Gleixner wrote:

On Fri, Aug 21 2020 at 09:45, Jason Gunthorpe wrote:

quoted

On Fri, Aug 21, 2020 at 02:25:02AM +0200, Thomas Gleixner wrote:

quoted

+static void ims_mask_irq(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem;
+	u32 __iomem *ctrl = &slot->ctrl;
+
+	iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_UNMASK, ctrl);

Just to be clear, this is exactly the sort of operation we can't do
with non-MSI interrupts. For a real PCI device to execute this it
would have to keep the data on die.

We means NVIDIA and your new device, right?

We'd like to use this in the current Mellanox NIC HW, eg the mlx5
driver. (NVIDIA acquired Mellanox recently)

So if I understand correctly then the queue memory where the MSI
descriptor sits is in RAM.

Yes, IMHO that is the whole point of this 'IMS' stuff. If devices
could have enough on-die memory then they could just use really big
MSI-X tables. Currently due to on-die memory constraints mlx5 is
limited to a few hundred MSI-X vectors.

Since MSI-X tables are exposed via MMIO they can't be 'swapped' to
RAM.

Moving away from MSI-X's MMIO access model allows them to be swapped
to RAM. The cost is that accessing them for update is a
command/response operation not a MMIO operation.

The HW is already swapping the queues causing the interrupts to RAM,
so adding a bit of additional data to store the MSI addr/data is
reasonable.

To give some sense, a 'working set' for the NIC device in some cases
can be hundreds of megabytes of data. System RAM is used to store
this, and precious on-die memory holds some dynamic active set, much
like a processor cache.

How is that supposed to work if interrupt remapping is disabled?

The best we can do is issue a command to the device and spin/sleep
until completion. The device will serialize everything internally.

If the device has died the driver has code to detect and trigger a
PCI function reset which will definitely stop the interrupt.

So, the implementation of these functions would be to push any change
onto a command queue, trigger the device to DMA the command, spin/sleep
until the device returns a response and then continue on. If the
device doesn't return a response in a time window then trigger a WQ to
do a full device reset.

The spin/sleep is only needed if the update has to be synchronous, so
things like rebalancing could just push the rebalancing work and
immediately return.

If interrupt remapping is enabled then both are trivial because then the
irq chip can delegate everything to the parent chip, i.e. the remapping
unit.

I did like this notion that IRQ remapping could avoid the overhead of
spin/spleep. Most of the use cases we have for this will require the
IOMMU anyhow.

quoted

I saw the idxd driver was doing something like this, I assume it
avoids trouble because it is a fake PCI device integrated with the
CPU, not on a real PCI bus?

That's how it is implemented as far as I understood the patches. It's
device memory therefore iowrite32().

I don't know anything about idxd.. Given the scale of interrupt need I
assumed the idxd HW had some hidden swapping to RAM. 

Since it is on-die with the CPU there are a bunch of ways I could
imagine Intel could make MMIO triggered swapping work that are not
available to a true PCI-E device.

Jason

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help