Thread (13 messages) 13 messages, 5 authors, 2015-09-30

[RFC 0/2] VFIO: Add virtual MSI doorbell support.

From: Christoffer Dall <hidden>
Date: 2015-09-25 17:12:14
Also in: kvm, kvmarm, lkml

On Tue, Sep 22, 2015 at 11:09:14PM +0100, Marc Zyngier wrote:
On Tue, 4 Aug 2015 06:52:01 +0100
Bhushan Bharat [off-list ref] wrote:
quoted
quoted
-----Original Message-----
From: Pranavkumar Sawargaonkar [mailto:pranavkumar at linaro.org]
Sent: Tuesday, August 04, 2015 11:18 AM
To: Bhushan Bharat-R65777
Cc: kvm at vger.kernel.org; Alex Williamson; kvmarm at lists.cs.columbia.edu;
linux-arm-kernel at lists.infradead.org; linux-kernel at vger.kernel.org;
christoffer.dall at linaro.org; marc.zyngier at arm.com; will.deacon at arm.com;
bhelgaas at google.com; arnd at arndb.de; rob.herring at linaro.org;
eric.auger at linaro.org; patches at apm.com; Yoder Stuart-B08248
Subject: Re: [RFC 0/2] VFIO: Add virtual MSI doorbell support.

Hi Bharat,

On 28 July 2015 at 23:28, Alex Williamson [off-list ref]
wrote:
quoted
On Tue, 2015-07-28 at 17:23 +0000, Bhushan Bharat wrote:
quoted
Hi Alex,
quoted
-----Original Message-----
From: Alex Williamson [mailto:alex.williamson at redhat.com]
Sent: Tuesday, July 28, 2015 9:52 PM
To: Pranavkumar Sawargaonkar
Cc: kvm at vger.kernel.org; kvmarm at lists.cs.columbia.edu; linux-arm-
kernel at lists.infradead.org; linux-kernel at vger.kernel.org;
christoffer.dall at linaro.org; marc.zyngier at arm.com;
will.deacon at arm.com; bhelgaas at google.com; arnd at arndb.de;
rob.herring at linaro.org; eric.auger at linaro.org; patches at apm.com;
Bhushan Bharat-R65777; Yoder
Stuart-B08248
Subject: Re: [RFC 0/2] VFIO: Add virtual MSI doorbell support.

On Fri, 2015-07-24 at 14:33 +0530, Pranavkumar Sawargaonkar wrote:
quoted
In current VFIO MSI/MSI-X implementation, linux host kernel
allocates MSI/MSI-X vectors when userspace requests through vfio
ioctls.
quoted
quoted
quoted
quoted
Vfio creates irqfd mappings to notify MSI/MSI-X interrupts to the
userspace when raised.
Guest OS will see emulated MSI/MSI-X controller and receives an
interrupt when kernel notifies the same via irqfd.

Host kernel allocates MSI/MSI-X using standard linux routines
like
pci_enable_msix_range() and pci_enable_msi_range().
These routines along with requset_irq() in host kernel sets up
MSI/MSI-X vectors with Physical MSI/MSI-X addresses provided by
interrupt controller driver in host kernel.

This means when a device is assigned with the guest OS, MSI/MSI-X
addresses present in PCIe EP are the PAs programmed by the host
linux
kernel.
quoted
In x86 MSI/MSI-X physical address range is reserved and iommu is
aware about these addreses and transalation is bypassed for these
address range.
quoted
quoted
quoted
quoted
Unlike x86, ARM/ARM64 does not reserve MSI/MSI-X Physical address
range and all the transactions including MSI go through
iommu/smmu
without bypass.
quoted
This requires extending current vfio MSI layer with additional
functionality for ARM/ARM64 by 1. Programing IOVA (referred as a
MSI virtual doorbell address)
   in device's MSI vector as a MSI address.
   This IOVA will be provided by the userspace based on the
   MSI/MSI-X addresses reserved for the guest.
2. Create an IOMMU mapping between this IOVA and
   Physical address (PA) assigned to the MSI vector.

This RFC is proposing a solution for MSI/MSI-X passthrough for
ARM/ARM64.


Hi Pranavkumar,

Freescale has the same, or very similar, need, so any solution in
this space will need to work for both ARM and powerpc.  I'm not a
big fan of this approach as it seems to require the user to
configure MSI/X via ioctl and then call a separate ioctl mapping
the doorbells.  That's more code for the user, more code to get
wrong and potentially a gap between configuring MSI/X and enabling
mappings where we could see IOMMU faults.
quoted
quoted
quoted
If we know that doorbell mappings are required, why can't we set
aside a bank of IOVA space and have them mapped automatically as
MSI/X is being configured?  Then the user's need for special
knowledge and handling of this case is limited to setup.  The IOVA
space will be mapped and used as needed, we only need the user to
specify the IOVA space reserved for this.  Thanks,
We probably need a mix of both to support Freescale PowerPC and ARM
based machines.
In this mix mode kernel vfio driver will reserve some IOVA for
mapping MSI page/s.
If vfio is reserving pages independently from the user, this becomes
what Marc called "shaping" the VM and what x86 effectively does.  An
interface extension should expose these implicit regions so the user
can avoid them for DMA memory mapping.
quoted
 If any other iova mapping will overlap with this then it will return
error and user-space. Ideally this should be choosen in such a way
that it never overlap, which is easy on some systems but can be
tricky on some other system like Freescale PowerPC. This is not
sufficient for at-least Freescale PowerPC based SOC. This is because
of hardware limitation, where we need to fit this reserved iova
address within aperture decided by user-space. So if we allow
user-space to change this reserved iova address to a value decided by
user-spece itself then we can support both ARM/PowerPC based
solutions.
quoted
Yes, that's my intention, to allow userspace to specify the reserved
region.  I believe you have some additional restrictions on the number
of MSI banks available and whether MSI banks can be shared, but I
would hope that doesn't preclude a shared interface with ARM.
quoted
I have some implementation ready/tested with this approach and if
this approach looks good then I can submit a RFC patch.
Yes, please post.  Thanks,
Could you please share a tentative timeline by which you will be posting your
patches ?
I have not touched that code for a while, I am planning to send the
patch in couple of weeks.
Have we made any progress on this subject? It looks like a lot of time
has passed, but I haven't seen anything. Did I miss it?
Pranav is going to respin his series, because we are clarly not making
progress on this front.

Thanks,
-Christoffer
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help