Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs | cgroups

quoted

Hi Jason,

On Mon, 22 Mar 2021 09:03:00 -0300, Jason Gunthorpe [off-list ref] wrote:

On Fri, Mar 19, 2021 at 11:22:21AM -0700, Jacob Pan wrote:
Hi Jason,

On Fri, 19 Mar 2021 10:54:32 -0300, Jason Gunthorpe [off-list ref]
wrote: 
On Fri, Mar 19, 2021 at 02:41:32PM +0100, Jean-Philippe Brucker
wrote:  
On Fri, Mar 19, 2021 at 09:46:45AM -0300, Jason Gunthorpe wrote:    
On Fri, Mar 19, 2021 at 10:58:41AM +0100, Jean-Philippe Brucker
wrote: 
Although there is no use for it at the moment (only two upstream
users and it looks like amdkfd always uses current too), I quite
like the client-server model where the privileged process does
bind() and programs the hardware queue on behalf of the client
process.    
This creates a lot complexity, how do does process A get a secure
reference to B? How does it access the memory in B to setup the
HW?    
mm_access() for example, and passing addresses via IPC    
I'd rather the source process establish its own PASID and then pass
the rights to use it to some other process via FD passing than try to
go the other way. There are lots of security questions with something
like mm_access.

Thank you all for the input, it sounds like we are OK to remove mm
argument from iommu_sva_bind_device() and iommu_sva_alloc_pasid() for
now?

Let me try to summarize PASID allocation as below:

Interfaces	| Usage	|  Limit	| bind¹ |User visible
/dev/ioasid²	| G-SVA/IOVA	|  cgroup	| No
|Yes char dev³	| SVA		|  cgroup	|
Yes	|No iommu driver	| default PASID|  no
| No	|No kernel		| super SVA	| no
	| yes   |No

¹ Allocated during SVA bind
² PASIDs allocated via /dev/ioasid are not bound to any mm. But its
  ownership is assigned to the process that does the allocation.  
What does "not bound to a mm" mean?
I meant, the IOASID allocated via /dev/ioasid is in a clean state (just a
number). It's initial state is not bound to an mm. Unlike, sva_bind_device()
where the IOASID is allocated during bind time.

The use case is to support guest SVA bind, where allocation and bind are in
two separate steps.

IMHO a use created PASID is either bound to a mm (current) at creation
time, or it will never be bound to a mm and its page table is under
user control via /dev/ioasid.
True for PASID used in native SVA bind. But for binding with a guest mm,
PASID is allocated first (VT-d virtual cmd interface Spec 10.4.44), the
bind with the host IOMMU when vIOMMU PASID cache is invalidated.

Our intention is to have two separate interfaces:
1. /dev/ioasid (allocation/free only)
2. /dev/sva (handles all SVA related activities including page tables)

I thought the whole point of something like a /dev/ioasid was to get
away from each and every device creating its own PASID interface?
yes, but only for the use cases that need to expose PASID to the userspace.
AFAICT, the cases are:
1. guest SVA (bind guest mm)
2. full PF/VF assignment(not mediated) where guest driver want to program
the actual PASID onto the device.

It maybe somewhat reasonable that some devices could have some easy
'make a SVA PASID on current' interface built in,
I agree, this is the case PASID is hidden from the userspace, right? e.g.
uacce.

but anything more
complicated should use /dev/ioasid, and anything consuming PASID
should also have an API to import and attach a PASID from /dev/ioasid.
Would the above two use cases constitute the "complicated" criteria? Or we
should say anything that need the explicit PASID value has to through
/dev/ioasid?

Could you give some highlevel hint on the APIs that hook up IOASID
allocated from /dev/ioasid and use cases that combine device and domain
information? Yi is working on /dev/sva RFC, it would be good to have a
direction check.

Currently, the proposed /dev/ioasid interface does not map individual
PASID with an FD. The FD is at the ioasid_set granularity and bond to
the current mm. We could extend the IOCTLs to cover individual PASID-FD
passing case when use cases arise. Would this work?  
Is it a good idea that the FD is per ioasid_set ?
We were thinking the allocation IOCTL is on a per set basis, then we know
the ownership of between PASIDs and its set. If per PASID FD is needed, we
can extend.

What is the set used
for?
I tried to document the concept in
https://lore.kernel.org/lkml/1614463286-97618-2-git-send-email-jacob.jun.pan@linux.intel.com/ (local)

In terms of usage for guest SVA, an ioasid_set is mostly tied to a host mm,
the use case is as the following:
1. Identify a pool of PASIDs for permission checking (below to the same VM),
e.g. only allow SVA binding for PASIDs allocated from the same set.

2. Allow different PASID-aware kernel subsystems to associate, e.g. KVM,
device drivers, and IOMMU driver. i.e. each KVM instance only cares about
the ioasid_set associated with the VM. Events notifications are also within
the ioasid_set to synchronize PASID states.

3. Guest-Host PASID look up (each set has its own XArray to store the
mapping)

4. Quota control (going away once we have cgroup)

Usually kernel interfaces work nicer with a one fd/one object model.

But even if it is a set, you could pass the set between co-operating
processes and the PASID can be created in the correct 'current'. But
there is all kinds of security questsions as soon as you start doing
anything like this - is there really a use case?
We don't see a use case for passing ioasid_set to another process. All the
four use cases above are for the current process.

Jason

Thanks,

Jacob
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help