Thread (56 messages) 56 messages, 10 authors, 2018-08-14

Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

From: Kenneth Lee <hidden>
Date: 2018-08-02 03:41:39
Also in: kvm, linux-crypto, linux-iommu, lkml

On Thu, Aug 02, 2018 at 02:59:33AM +0000, Tian, Kevin wrote:
Date: Thu, 2 Aug 2018 02:59:33 +0000
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Kenneth Lee <redacted>, Jonathan Corbet <corbet@lwn.net>,
 Herbert Xu [off-list ref], "David S . Miller"
 [off-list ref], Joerg Roedel [off-list ref], Alex Williamson
 [off-list ref], Kenneth Lee [off-list ref], Hao
 Fang [off-list ref], Zhou Wang [off-list ref], Zaibo Xu
 [off-list ref], Philippe Ombredanne [off-list ref], Greg
 Kroah-Hartman [off-list ref], Thomas Gleixner
 [off-list ref], "linux-doc@vger.kernel.org"
 [off-list ref], "linux-kernel@vger.kernel.org"
 [off-list ref], "linux-crypto@vger.kernel.org"
 [off-list ref], "iommu@lists.linux-foundation.org"
 [off-list ref], "kvm@vger.kernel.org"
 [off-list ref], "linux-accelerators@lists.ozlabs.org"
 [off-list ref], Lu Baolu
 [off-list ref], "Kumar, Sanjay K" [off-list ref]
CC: "linuxarm@huawei.com" <redacted>
Subject: RE: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
Message-ID: [ref]
quoted
From: Kenneth Lee
Sent: Wednesday, August 1, 2018 6:22 PM

From: Kenneth Lee <redacted>

WarpDrive is an accelerator framework to expose the hardware capabilities
directly to the user space. It makes use of the exist vfio and vfio-mdev
facilities. So the user application can send request and DMA to the
hardware without interaction with the kernel. This remove the latency
of syscall and context switch.

The patchset contains documents for the detail. Please refer to it for more
information.

This patchset is intended to be used with Jean Philippe Brucker's SVA
patch [1] (Which is also in RFC stage). But it is not mandatory. This
patchset is tested in the latest mainline kernel without the SVA patches.
So it support only one process for each accelerator.
If no sharing, then why not just assigning the whole parent device to
the process? IMO if SVA usage is the clear goal of your series, it
might be made clearly so then Jean's series is mandatory dependency...
We don't know how SVA will be finally. But the feature, "make use of
per-PASID/substream ID IOMMU page table", should be able to be enabled in the
kernel. So we don't want to enforce it here. After we have this serial ready, it
can be hooked to any implementation.

Further more, even without "per-PASID IOMMU page table", this series has its
value. It is not simply dedicate the whole device to the process. It "shares"
the device with the kernel driver. So you can support crypto and a user
application at the same time.
quoted
With SVA support, WarpDrive can support multi-process in the same
accelerator device.  We tested it in our SoC integrated Accelerator (board
ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].

We have noticed the IOMMU aware mdev RFC announced recently [3].

The IOMMU aware mdev has similar idea but different intention comparing
to
WarpDrive. It intends to dedicate part of the hardware resource to a VM.
Not just to VM, though I/O Virtualization is in the name. You can assign
such mdev to either VMs, containers, or bare metal processes. It's just
a fully-isolated device from user space p.o.v.
Oh, yes. Thank you for clarification.
quoted
And the design is supposed to be used with Scalable I/O Virtualization.
While spimdev is intended to share the hardware resource with a big
amount
of processes.  It just requires the hardware supporting address
translation per process (PCIE's PASID or ARM SMMU's substream ID).

But we don't see serious confliction on both design. We believe they can be
normalized as one.
yes there are something which can be shared, e.g. regarding to
the interface to IOMMU.

Conceptually I see them different mindset on device resource sharing:

WarpDrive more aims to provide a generic framework to enable SVA
usages on various accelerators, which lack of a well-abstracted user
API like OpenCL. SVA is a hardware capability - sort of exposing resources
composing ONE capability to user space through mdev framework. It is
not like a VF which naturally carries most capabilities as PF.
Yes. But we believe the user abstraction layer will be enabled soon when the
channel is opened. WarpDrive gives the hardware the chance to serve the
application directly. For example, an AI engine can be called by many processes
for inference. The resource need not to be dedicated to one particular process.
Intel Scalable I/O virtualization is a thorough design to partition the
device into minimal sharable copies (queue, queue pair, context), 
while each copy carries most PF capabilities (including SVA) similar to
VF. Also with IOMMU scalable mode support, the copy can be 
independently assigned to any client (process, container, VM, etc.)
Yes, we can see this intension.
Thanks
Kevin
Thank you.

-- 
			-Kenneth(Hisilicon)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help