Thread (268 messages) 268 messages, 15 authors, 2021-06-08

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: Jean-Philippe Brucker <hidden>
Date: 2021-03-26 08:07:44
Also in: linux-iommu, lkml

On Thu, Mar 25, 2021 at 02:16:45PM -0300, Jason Gunthorpe wrote:
On Thu, Mar 25, 2021 at 10:02:36AM -0700, Jacob Pan wrote:
quoted
Hi Jean-Philippe,

On Thu, 25 Mar 2021 11:21:40 +0100, Jean-Philippe Brucker
[off-list ref] wrote:
quoted
On Wed, Mar 24, 2021 at 03:12:30PM -0700, Jacob Pan wrote:
quoted
Hi Jason,

On Wed, 24 Mar 2021 14:03:38 -0300, Jason Gunthorpe [off-list ref]
wrote: 
quoted
On Wed, Mar 24, 2021 at 10:02:46AM -0700, Jacob Pan wrote:  
quoted
quoted
Also wondering about device driver allocating auxiliary domains
for their private use, to do iommu_map/unmap on private PASIDs (a
clean replacement to super SVA, for example). Would that go
through the same path as /dev/ioasid and use the cgroup of
current task?    
For the in-kernel private use, I don't think we should restrict
based on cgroup, since there is no affinity to user processes. I
also think the PASID allocation should just use kernel API instead
of /dev/ioasid. Why would user space need to know the actual PASID
# for device private domains? Maybe I missed your idea?    
There is not much in the kernel that isn't triggered by a process, I
would be careful about the idea that there is a class of users that
can consume a cgroup controlled resource without being inside the
cgroup.

We've got into trouble before overlooking this and with something
greenfield like PASID it would be best built in to the API to prevent
a mistake. eg accepting a cgroup or process input to the allocator.
  
Make sense. But I think we only allow charging the current cgroup, how
about I add the following to ioasid_alloc():

	misc_cg = get_current_misc_cg();
	ret = misc_cg_try_charge(MISC_CG_RES_IOASID, misc_cg, 1);
	if (ret) {
		put_misc_cg(misc_cg);
		return ret;
	}  
Does that allow PASID allocation during driver probe, in kernel_init or
modprobe context?
Good point. Yes, you can get cgroup subsystem state in kernel_init for
charging/uncharging. I would think module_init should work also since it is
after kernel_init. I have tried the following:
static int __ref kernel_init(void *unused)
 {
        int ret;
+       struct cgroup_subsys_state *css;
+       css = task_get_css(current, pids_cgrp_id);

But that would imply:
1. IOASID has to be built-in, not as module
If IOASID is a module, the device driver will probe once the IOMMU module
is available, which I think always happens in probe deferral kworker.
quoted
2. IOASIDs charged on PID1/init would not subject to cgroup limit since it
will be in the root cgroup and we don't support migration nor will migrate.

Then it comes back to the question of why do we try to limit in-kernel
users per cgroup if we can't enforce these cases.
It may be better to explicitly pass a cgroup during allocation as Jason
suggested. That way anyone using the API will have to be aware of this and
pass the root cgroup if that's what they want.
Are these real use cases? Why would a driver binding to a device
create a single kernel pasid at bind time? Why wouldn't it use
untagged DMA?
It's not inconceivable to have a control queue doing DMA tagged with
PASID. The devices I know either use untagged DMA, or have a choice to use
a PASID. We're not outright forbidding PASID allocation at boot (I don't
think we can or should) and we won't be able to check every use of the
API, so I'm trying to figure out whether it will always default to root
cgroup, or crash in some corner case.

Thanks,
Jean
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help