Thread (268 messages) 268 messages, 15 authors, 2021-06-08

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: David Gibson <hidden>
Date: 2021-05-13 06:07:18
Also in: linux-iommu, lkml

On Mon, May 03, 2021 at 01:15:18PM -0300, Jason Gunthorpe wrote:
On Thu, Apr 29, 2021 at 01:04:05PM +1000, David Gibson wrote:
quoted
Again, I don't know enough about VDPA to make sense of that.  Are we
essentially talking non-PCI virtual devices here?  In which case you
could define the VDPA "bus" to always have one-device groups.
It is much worse than that.

What these non-PCI devices need is for the kernel driver to be part of
the IOMMU group of the underlying PCI device but tell VFIO land that
"groups don't matter"
I don't really see a semantic distinction between "always one-device
groups" and "groups don't matter".  Really the only way you can afford
to not care about groups is if they're singletons.
Today mdev tries to fake this by using singleton iommu groups, but it
is really horrible and direcly hacks up the VFIO IOMMU code to
understand these special cases. Intel was proposing more special
hacking in the VFIO IOMMU code to extend this to PASID.
At this stage I don't really understand why that would end up so
horrible.
When we get to a /dev/ioasid this is all nonsense. The kernel device
driver is going to have to tell drivers/iommu exactly what kind of
ioasid it can accept, be it a PASID inside a kernel owned group, a SW
emulated 'mdev' ioasid, or whatever.

In these cases the "group" idea has become a fiction that just creates
a pain.
I don't see how the group is a fiction in this instance.  You can
still have devices that can't be isolated, therefore you can have
non-singleton groups.
"Just reorganize VDPA to do something insane with the driver
core so we can create a dummy group to satisfy an unnecessary uAPI
restriction" is not a very compelling argument.

So if the nonsensical groups goes away for PASID/mdev, where does it
leave the uAPI in other cases?
quoted
I don't think simplified-but-wrong is a good goal.  The thing about
groups is that if they're there, you can't just "not care" about them,
they affect you whether you like it or not.
You really can. If one thing claims the group then all the other group
devices become locked out.
Aside: I'm primarily using "group" to mean the underlying hardware
unit, not the vfio construct on top of it, I'm not sure that's been
clear throughout.

So.. your model assumes that every device has a safe quiescent state
where it won't do any harm until poked, whether its group is
currently kernel owned, or owned by a userspace that doesn't know
anything about it.

At minimum this does mean that in order to use one device in the group
you must have permission to use *all* the devices in the group -
otherwise you may be able to operate a device you don't have
permission to by DMAing to its registers from a device you do have
permission to.

Whatever scripts are managing ownership of devices also need to know
about groups, because they need to put all the devices into that
quiescent state before the group can change ownership.
The main point to understand is that groups are NOT an application
restriction! It is a whole system restriction that the operator needs
to understand and deal with. This is why things like dpdk don't care
about the group at all - there is nothing they can do with the
information.

If the operator says to run dpdk on a specific device then the
operator is the one that has to deal with all the other devices in the
group getting locked out.
Ok, I think I see your point there.
At best the application can make it more obvious that the operator is
doing something dangerous, but the current kernel API doesn't seem to
really support that either.

Jason
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help