Thread (37 messages) 37 messages, 8 authors, 2018-11-27

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

From: Kenneth Lee <hidden>
Date: 2018-11-23 08:01:15
Also in: linux-crypto, linux-doc, linux-rdma, lkml

On Wed, Nov 21, 2018 at 07:58:40PM -0700, Jason Gunthorpe wrote:
Date: Wed, 21 Nov 2018 19:58:40 -0700
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Kenneth Lee <redacted>
CC: Leon Romanovsky <leon@kernel.org>, Kenneth Lee <redacted>,
 Tim Sell [off-list ref], linux-doc@vger.kernel.org, Alexander
 Shishkin [off-list ref], Zaibo Xu
 [off-list ref], zhangfei.gao@foxmail.com, linuxarm@huawei.com,
 haojian.zhuang@linaro.org, Christoph Lameter [off-list ref], Hao Fang
 [off-list ref], Gavin Schenk [off-list ref], RDMA mailing
 list [off-list ref], Zhou Wang [off-list ref],
 Doug Ledford [off-list ref], Uwe Kleine-König
 [off-list ref], David Kershner
 [off-list ref], Johan Hovold [off-list ref], Cyrille
 Pitchen [off-list ref], Sagar Dharia
 [off-list ref], Jens Axboe [off-list ref],
 guodong.xu@linaro.org, linux-netdev [off-list ref], Randy Dunlap
 [off-list ref], linux-kernel@vger.kernel.org, Vinod Koul
 [off-list ref], linux-crypto@vger.kernel.org, Philippe Ombredanne
 [off-list ref], Sanyog Kale [off-list ref], "David S.
 Miller" [off-list ref], linux-accelerators@lists.ozlabs.org
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
User-Agent: Mutt/1.9.4 (2018-02-28)
Message-ID: [ref]

On Wed, Nov 21, 2018 at 02:08:05PM +0800, Kenneth Lee wrote:
quoted
quoted
But considering Jean's SVA stuff seems based on mmu notifiers, I have
a hard time believing that it has any different behavior from RDMA's
ODP, and if it does have different behavior, then it is probably just
a bug in the ODP implementation.
As Jean has explained, his solution is based on page table sharing. I think ODP
should also consider this new feature.
Shared page tables would require the HW to walk the page table format
of the CPU directly, not sure how that would be possible for ODP?

Presumably the implementation for ARM relies on the IOMMU hardware
doing this?
Yes, that is the idea. And since Jean is merging the AMD and Intel solution
together, I assume they can do the same. This is also the reason I want to solve
my problem on top of IOMMU directly. But anyway, let me try to see if I can
merge the logic with ODP.
quoted
quoted
quoted
quoted
If all your driver needs is to mmap some PCI bar space, route
interrupts and do DMA mapping then mediated VFIO is probably a good
choice. 
Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
try not to add complexity to the mm subsystem.
Why would a mediated VFIO driver touch the mm subsystem? Sounds like
you don't have a VFIO driver if it needs to do stuff like that...
VFIO has no ODP-like solution, and if we want to solve the fork problem, we have
to make some change to iommu and the fork procedure. Further, VFIO takes every
queue as a independent device. This create a lot of trouble on resource
management. For example, you will need a manager process to withdraw the unused
device and you need to let the user process know about PASID of the queue, and
so on.
Well, I would think you'd add SVA support to the VFIO driver as a
generic capability - it seems pretty useful for any VFIO user as it
avoids all the kernel upcalls to do memory pinning and DMA address
translation.
It is already part of Jean's patchset. And that's why I built my solution on
VFIO in the first place. But I think the concept of SVA and PASID is not
compatible with the original VFIO concept space. You would not share your whole
address space to a device at all in a virtual machine manager, wouldn't you? And
if you can manage to have a separated mdev for your virtual machine, why bother
to set a PASID to it?  The answer to those problem, I think, will be Intel's
Scalable IO Virtualization. For accelerator, the requirement is simply: getting
a handle to device, attaching the process's mm with the handle by sharing the
process's page table with its iommu indexed by PASID, and start the
communication...
Once the VFIO driver knows about this as a generic capability then the
device it exposes to userspace would use CPU addresses instead of DMA
addresses.

The question is if your driver needs much more than the device
agnostic generic services VFIO provides.

I'm not sure what you have in mind with resource management.. It is
hard to revoke resources from userspace, unless you are doing
kernel syscalls, but then why do all this?
Say, I have 1024 queues in my accelerator. I can get one by opening the device
and attach it with the fd. If the process exit by any means, the queue can be
returned with the release of the fd. But if it is mdev, it will still be there
and some one should tell the allocator it is available again. This is not easy
to design in user space.
Jason
-- 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help