Thread (268 messages) 268 messages, 15 authors, 2021-06-08

RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: Tian, Kevin <hidden>
Date: 2021-05-07 07:39:02
Also in: linux-iommu, lkml

From: Alex Williamson <redacted>
Sent: Wednesday, April 28, 2021 11:06 PM

On Wed, 28 Apr 2021 06:34:11 +0000
"Tian, Kevin" [off-list ref] wrote:
quoted
quoted
From: Jason Gunthorpe <redacted>
Sent: Monday, April 26, 2021 8:38 PM
[...]
quoted
quoted
Want to hear your opinion for one open here. There is no doubt that
an ioasid represents a HW page table when the table is constructed by
userspace and then linked to the IOMMU through the bind/unbind
API. But I'm not very sure about whether an ioasid should represent
the exact pgtable or the mapping metadata when the underlying
pgtable is indirectly constructed through map/unmap API. VFIO does
the latter way, which is why it allows multiple incompatible domains
in a single container which all share the same mapping metadata.
I think VFIO's map/unmap is way too complex and we know it has bad
performance problems.
Can you or Alex elaborate where the complexity and performance problem
locate in VFIO map/umap? We'd like to understand more detail and see
how
quoted
to avoid it in the new interface.

The map/unmap interface is really only good for long lived mappings,
the overhead is too high for things like vIOMMU use cases or any case
where the mapping is intended to be dynamic.  Userspace drivers must
make use of a long lived buffer mapping in order to achieve performance.
This is not a limitation of VFIO map/unmap. It's the limitation of any
map/unmap semantics since the fact of long-lived vs. short-lived is 
imposed by userspace. Nested translation is the only viable optimization
allowing 2nd-level to be a long-lived mapping even w/ vIOMMU. From 
this angle I'm not sure how a new map/unmap implementation could 
address this perf limitation alone.
The mapping and unmapping granularity has been a problem as well,
type1v1 allowed arbitrary unmaps to bisect the original mapping, with
the massive caveat that the caller relies on the return value of the
unmap to determine what was actually unmapped because the IOMMU use
of
superpages is transparent to the caller.  This led to type1v2 that
simply restricts the user to avoid ever bisecting mappings.  That still
leaves us with problems for things like virtio-mem support where we
need to create initial mappings with a granularity that allows us to
later remove entries, which can prevent effective use of IOMMU
superpages.
We could start with a semantics similar to type1v2. 

btw why does virtio-mem require a smaller granularity? Can we split
superpages in-the-fly when removal actually happens (just similar
to page split in VM live migration for efficient dirty page tracking)?

and isn't it another problem imposed by userspace? How could a new
map/unmap implementation mitigate this problem if the userspace 
insists on a smaller granularity for initial mappings?
Locked page accounting has been another constant issue.  We perform
locked page accounting at the container level, where each container
accounts independently.  A user may require multiple containers, the
containers may pin the same physical memory, but be accounted against
the user once per container.
for /dev/ioasid there is still an open whether an process is allowed to
open /dev/ioasid once or multiple times. If there is only one ioasid_fd
per process, the accounting can be made accurately. otherwise the
same problem still exists as each ioasid_fd is akin to the container, then
we need find a better solution.
Those are the main ones I can think of.  It is nice to have a simple
map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't
raise the barrier to entry too high, but the user needs to have the
ability to have more control of their mappings and locked page
accounting should probably be offloaded somewhere.  Thanks,
Based on your feedbacks I feel it's probably reasonable to start with
a type1v2 semantics for the new interface. Locked accounting could
also start with the same VFIO restriction and then improve it
incrementally, if a cleaner way is intrusive (if not affecting uAPI).
But I didn't get the suggestion on "more control of their mappings".
Can you elaborate?

Thanks
Kevin
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help