Thread (57 messages) 57 messages, 11 authors, 2018-08-14

Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

From: Kenneth Lee <hidden>
Date: 2018-08-13 09:31:11
Also in: kvm, linux-doc, linux-iommu, lkml

On Sat, Aug 11, 2018 at 11:26:48PM +0800, Kenneth Lee wrote:
Date: Sat, 11 Aug 2018 23:26:48 +0800
From: Kenneth Lee <redacted>
To: Jean-Philippe Brucker <redacted>, Kenneth Lee
 [off-list ref], Jerome Glisse [off-list ref]
CC: Herbert Xu <herbert@gondor.apana.org.au>, "kvm@vger.kernel.org"
 [off-list ref], Jonathan Corbet [off-list ref], Greg
 Kroah-Hartman [off-list ref], Zaibo Xu [off-list ref],
 "linux-doc@vger.kernel.org" [off-list ref], "Kumar, Sanjay K"
 [off-list ref], "Tian, Kevin" [off-list ref],
 "iommu@lists.linux-foundation.org" [off-list ref],
 "linux-kernel@vger.kernel.org" [off-list ref],
 "linuxarm@huawei.com" [off-list ref], Alex Williamson
 [off-list ref], "linux-crypto@vger.kernel.org"
 [off-list ref], Philippe Ombredanne
 [off-list ref], Thomas Gleixner [off-list ref], Hao Fang
 [off-list ref], "David S . Miller" [off-list ref],
 "linux-accelerators@lists.ozlabs.org"
 [off-list ref]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
Message-ID: [ref]



在 2018年08月10日 星期五 09:12 下午, Jean-Philippe Brucker 写道:
quoted
Hi Kenneth,

On 10/08/18 04:39, Kenneth Lee wrote:
quoted
quoted
You can achieve everything you want to achieve with existing upstream
solution. Re-inventing a whole new driver infrastructure should really
be motivated with strong and obvious reasons.
I want to understand better of your idea. If I create some unified helper
APIs in drivers/iommu/, say:

wd_create_dev(parent_dev, wd_dev)
wd_release_dev(wd_dev)

The API create chrdev to take request from user space for open(resource
allocation), iomap, epoll (irq), and dma_map(with pasid automatically).

Do you think it is acceptable?
Maybe not drivers/iommu/ :) That subsystem only contains tools for
dealing with DMA, I don't think epoll, resource enumeration or iomap fit
in there.
Yes. I should consider where to put it carefully.
quoted
Creating new helpers seems to be precisely what we're trying to avoid in
this thread, and vfio-mdev does provide the components that you
describe, so I wouldn't discard it right away. When the GPU, net, block
or another subsystem doesn't fit your needs, either because your
accelerator provides some specialized function, or because for
performance reasons your client wants direct MMIO access, you can at
least build your driver and library on top of those existing VFIO
components:

* open allocates a partition of an accelerator.
* vfio_device_info, vfio_region_info and vfio_irq_info enumerates
available resources.
* vfio_irq_set deals with epoll.
* mmap gives you a private MMIO doorbell.
* vfio_iommu_type1 provides the DMA operations.

Currently missing:

* Sharing the parent IOMMU between mdev, which is also what the "IOMMU
aware mediated device" series tackles, and seems like a logical addition
to VFIO. I'd argue that the existing IOMMU ops (or ones implemented by
the SVA series) can be used to deal with this

* The interface to discover an accelerator near your memory node, or one
that you can chain with other devices. If I understood correctly the
conclusion was that the API (a topology description in sysfs?) should be
common to various subsystems, in which case vfio-mdev (or the mediating
driver) could also use it.

* The queue abstraction discussed on patch 3/7. Perhaps the current vfio
resource description of MMIO and IRQ is sufficient here as well, since
vendors tend to each implement their own queue schemes. If you need
additional features, read/write fops give the mediating driver a lot of
freedom. To support features that are too specific for drivers/vfio/ you
can implement a config space with capabilities and registers of your
choice. If you're versioning the capabilities, the code to handle them
could even be shared between different accelerator drivers and libraries.
Thank you, Jean,

The major reason that I want to remove dependency to VFIO is: I
accepted that the whole logic of VFIO was built on the idea of
creating virtual device.

Let's consider it in this way: We have hardware with IOMMU support.
So we create a default_domain to the particular IOMMU (unit) in the
group for the kernel driver to use it. Now the device is going to be
used by a VM or a Container. So we unbind it from the original
driver, and put the default_domain away,  create a new domain for
this particular use case.  So now the device shows up as a platform
or pci device to the user space. This is what VFIO try to provide.
Mdev extends the scenario but dose not change the intention. And I
think that is why Alex emphasis pre-allocating resource to the mdev.

But what WarpDrive need is to get service from the hardware itself
and set mapping to its current domain, aka defaut_domain. If we do
it in VFIO-mdev, it looks like the VFIO framework takes all the
effort to put the default_domain away and create a new one and be
ready for user space to use. But I tell him stop using the new
domain and try the original one...

It is not reasonable, isn't it:)

So why don't I just take the request and set it into the
default_domain directly? The true requirement of WarpDrive is to let
process set the page table for particular pasid or substream id, so
it can accept command with address in the process space. It needs no
device.

From this perspective, it seems there is no reason to keep it in VFIO.
I made a quick change basing on the RFCv1 here: 

https://github.com/Kenneth-Lee/linux-kernel-warpdrive/commits/warpdrive-v0.6

I just made it compilable and not test it yet. But it shows how the idea is
going to be.

The Pros is: most of the virtual device stuff can be removed. Resource
management is on the openned files only.

The Cons is: as Jean said, we have to redo something that has been done by VFIO.
These mainly are:

1. Track the dma operation and remove them on resource releasing
2. Pin the memory with gup and do accounting

It not going to be easy to make a decision...
Thanks
Kenneth
quoted
Thanks,
Jean
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help