Thread (18 messages) 18 messages, 5 authors, 2021-02-02

RE: [RFC PATCH v2] uacce: Add uacce_ctrl misc device

From: Song Bao Hua (Barry Song) <hidden>
Date: 2021-01-28 01:30:15
Also in: linux-iommu, lkml

-----Original Message-----
From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
Sent: Wednesday, January 27, 2021 7:20 AM
To: Song Bao Hua (Barry Song) <redacted>
Cc: Wangzhou (B) <wangzhou1@hisilicon.com>; Greg Kroah-Hartman
[off-list ref]; Arnd Bergmann [off-list ref]; Zhangfei Gao
[off-list ref]; linux-accelerators@lists.ozlabs.org;
linux-kernel@vger.kernel.org; iommu@lists.linux-foundation.org;
linux-mm@kvack.org; Liguozhu (Kenneth) [off-list ref]; chensihang
(A) [off-list ref]
Subject: Re: [RFC PATCH v2] uacce: Add uacce_ctrl misc device

On Tue, Jan 26, 2021 at 01:26:45AM +0000, Song Bao Hua (Barry Song) wrote:
quoted
quoted
On Mon, Jan 25, 2021 at 11:35:22PM +0000, Song Bao Hua (Barry Song) wrote:
quoted
quoted
On Mon, Jan 25, 2021 at 10:21:14PM +0000, Song Bao Hua (Barry Song)
wrote:
quoted
quoted
quoted
quoted
quoted
mlock, while certainly be able to prevent swapping out, it won't
be able to stop page moving due to:
* memory compaction in alloc_pages()
* making huge pages
* numa balance
* memory compaction in CMA
Enabling those things is a major reason to have SVA device in the
first place, providing a SW API to turn it all off seems like the
wrong direction.
I wouldn't say this is a major reason to have SVA. If we read the
history of SVA and papers, people would think easy programming due
to data struct sharing between cpu and device, and process space
isolation in device would be the major reasons for SVA. SVA also
declares it supports zero-copy while zero-copy doesn't necessarily
depend on SVA.
Once you have to explicitly make system calls to declare memory under
IO, you loose all of that.

Since you've asked the app to be explicit about the DMAs it intends to
do, there is not really much reason to use SVA for those DMAs anymore.
Let's see a non-SVA case. We are not using SVA, we can have
a memory pool by hugetlb or pin, and app can allocate memory
from this pool, and get stable I/O performance on the memory
from the pool. But device has its separate page table which
is not bound with this process, thus lacking the protection
of process space isolation. Plus, CPU and device are using
different address.
So you are relying on the platform to do the SVA for the device?
Sorry for late response.

uacce and its userspace framework UADK depend on SVA, leveraging
the enhanced security by isolated process address space.

This patch is mainly an extension for performance optimization to
get stable high-performance I/O on pinned memory even though the
hardware supports IO page fault to get pages back after swapping
out or page migration.
But IO page fault will cause serious latency jitter for high-speed
I/O.
For slow speed device, they don't need to use this extension.
This feels like it goes back to another topic where I felt the SVA
setup uAPI should be shared and not buried into every driver's unique
ioctls.

Having something like this in a shared SVA system is somewhat less
strange.
Sounds reasonable. On the other hand, uacce seems to be an common
uAPI for SVA, and probably the only one for this moment.

uacce is a framework not a specific driver as any accelerators
can hook into this framework as long as a device provides
uacce_ops and register itself by uacce_register(). Uacce, for
itself, doesn't bind with any specific hardware. So uacce interfaces
are kind of common uAPI :-)
Jason
Thanks
Barry

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help