Thread (110 messages) 110 messages, 5 authors, 2015-05-11

Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible

From: David Gibson <hidden>
Date: 2015-05-01 04:54:00
Also in: lkml

On Fri, May 01, 2015 at 10:46:08AM +1000, Benjamin Herrenschmidt wrote:
On Thu, 2015-04-30 at 19:33 +1000, Alexey Kardashevskiy wrote:
quoted
On 04/30/2015 05:22 PM, David Gibson wrote:
quoted
On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote:
quoted
At the moment only one group per container is supported.
POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
IOMMU group so we can relax this limitation and support multiple groups
per container.
It's not obvious why allowing multiple TCE tables per PE has any
pearing on allowing multiple groups per container.

This patchset is a global TCE tables rework (patches 1..30, roughly) with 2 
outcomes:
1. reusing the same IOMMU table for multiple groups - patch 31;
2. allowing dynamic create/remove of IOMMU tables - patch 32.

I can remove this one from the patchset and post it separately later but 
since 1..30 aim to support both 1) and 2), I'd think I better keep them all 
together (might explain some of changes I do in 1..30).
I think you are talking past each other :-)

But yes, having 2 tables per group is orthogonal to the ability of
having multiple groups per container.

The latter is made possible on P8 in large part because each PE has its
own DMA address space (unlike P5IOC2 or P7IOC where a single address
space is segmented).

Also, on P8 you can actually make the TVT entries point to the same
table in memory, thus removing the need to duplicate the actual
tables (though you still have to duplicate the invalidations). I would
however recommend only sharing the table that way within a chip/node.

 .../..
quoted
quoted
quoted
-1) Only one IOMMU group per container is supported as an IOMMU group
-represents the minimal entity which isolation can be guaranteed for and
-groups are allocated statically, one per a Partitionable Endpoint (PE)
+1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
+container is supported as an IOMMU table is allocated at the boot time,
+one table per a IOMMU group which is a Partitionable Endpoint (PE)
  (PE is often a PCI domain but not always).
quoted
quoted
I thought the more fundamental problem was that different PEs tended
to use disjoint bus address ranges, so even by duplicating put_tce
across PEs you couldn't have a common address space.
Yes. This is the problem with P7IOC and earlier. It *could* be doable on
P7IOC by making them the same PE but let's not go there.
quoted
Sorry, I am not following you here.

By duplicating put_tce, I can have multiple IOMMU groups on the same 
virtual PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple 
groups per container" does this, the address ranges will the same.
But that is only possible on P8 because only there do we have separate
address spaces between PEs.
quoted
What I cannot do on p5ioc2 is programming the same table to multiple 
physical PHBs (or I could but it is very different than IODA2 and pretty 
ugly and might not always be possible because I would have to allocate 
these pages from some common pool and face problems like fragmentation).
And P7IOC has a similar issue. The DMA address top bits indexes the
window on P7IOC within a shared address space. It's possible to
configure a TVT to cover multiple devices but with very serious
limitations.
Ok.  To check my understanding does this sound reasonable:

  * The table_group more-or-less represents a PE, but in a way you can
    reference without first knowing the specific IOMMU hardware type.

  * When attaching multiple groups to the same container, the first PE
    (i.e. table_group) attached is used as a representative so that
    subsequent groups can be checked for compatibility with the first
    PE and therefore all PEs currently included in the container

     - This is why the table_group appears in some places where it
       doesn't seem sensible from a pure object ownership point of
       view

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachments

  • (unnamed) [application/pgp-signature] 819 bytes
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help