Thread (37 messages) 37 messages, 8 authors, 2018-11-27

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

From: Kenneth Lee <hidden>
Date: 2018-11-21 03:01:16
Also in: linux-crypto, linux-doc, linux-rdma, lkml

On Tue, Nov 20, 2018 at 07:17:44AM +0200, Leon Romanovsky wrote:
Date: Tue, 20 Nov 2018 07:17:44 +0200
From: Leon Romanovsky <leon@kernel.org>
To: Kenneth Lee <redacted>
CC: Jason Gunthorpe <jgg@ziepe.ca>, Kenneth Lee <redacted>, Tim
 Sell [off-list ref], linux-doc@vger.kernel.org, Alexander
 Shishkin [off-list ref], Zaibo Xu
 [off-list ref], zhangfei.gao@foxmail.com, linuxarm@huawei.com,
 haojian.zhuang@linaro.org, Christoph Lameter [off-list ref], Hao Fang
 [off-list ref], Gavin Schenk [off-list ref], RDMA mailing
 list [off-list ref], Zhou Wang [off-list ref],
 Doug Ledford [off-list ref], Uwe Kleine-König
 [off-list ref], David Kershner
 [off-list ref], Johan Hovold [off-list ref], Cyrille
 Pitchen [off-list ref], Sagar Dharia
 [off-list ref], Jens Axboe [off-list ref],
 guodong.xu@linaro.org, linux-netdev [off-list ref], Randy Dunlap
 [off-list ref], linux-kernel@vger.kernel.org, Vinod Koul
 [off-list ref], linux-crypto@vger.kernel.org, Philippe Ombredanne
 [off-list ref], Sanyog Kale [off-list ref], "David S.
 Miller" [off-list ref], linux-accelerators@lists.ozlabs.org
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
User-Agent: Mutt/1.10.1 (2018-07-13)
Message-ID: [ref]

On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
quoted
On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:
quoted
Date: Mon, 19 Nov 2018 11:49:54 -0700
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Kenneth Lee <redacted>
CC: Leon Romanovsky <leon@kernel.org>, Kenneth Lee <redacted>,
 Tim Sell [off-list ref], linux-doc@vger.kernel.org, Alexander
 Shishkin [off-list ref], Zaibo Xu
 [off-list ref], zhangfei.gao@foxmail.com, linuxarm@huawei.com,
 haojian.zhuang@linaro.org, Christoph Lameter [off-list ref], Hao Fang
 [off-list ref], Gavin Schenk [off-list ref], RDMA mailing
 list [off-list ref], Zhou Wang [off-list ref],
 Doug Ledford [off-list ref], Uwe Kleine-König
 [off-list ref], David Kershner
 [off-list ref], Johan Hovold [off-list ref], Cyrille
 Pitchen [off-list ref], Sagar Dharia
 [off-list ref], Jens Axboe [off-list ref],
 guodong.xu@linaro.org, linux-netdev [off-list ref], Randy Dunlap
 [off-list ref], linux-kernel@vger.kernel.org, Vinod Koul
 [off-list ref], linux-crypto@vger.kernel.org, Philippe Ombredanne
 [off-list ref], Sanyog Kale [off-list ref], "David S.
 Miller" [off-list ref], linux-accelerators@lists.ozlabs.org
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
User-Agent: Mutt/1.9.4 (2018-02-28)
Message-ID: [ref]

On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
quoted
If the hardware cannot share page table with the CPU, we then need to have
some way to change the device page table. This is what happen in ODP. It
invalidates the page table in device upon mmu_notifier call back. But this cannot
solve the COW problem: if the user process A share a page P with device, and A
forks a new process B, and it continue to write to the page. By COW, the
process B will keep the page P, while A will get a new page P'. But you have
no way to let the device know it should use P' rather than P.
Is this true? I thought mmu_notifiers covered all these cases.

The mm_notifier for A should fire if B causes the physical address of
A's pages to change via COW.

And this causes the device page tables to re-synchronize.
I don't see such code. The current do_cow_fault() implemenation has nothing to
do with mm_notifer.
quoted
quoted
In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
to write any code for that. Because it has been done by IOMMU framework. If it
Looks like the IOMMU code uses mmu_notifier, so it is identical to
IB's ODP. The only difference is that IB tends to have the IOMMU page
table in the device, not in the CPU.

The only case I know if that is different is the new-fangled CAPI
stuff where the IOMMU can directly use the CPU's page table and the
IOMMU page table (in device or CPU) is eliminated.
Yes. We are not focusing on the current implementation. As mentioned in the
cover letter. We are expecting Jean Philips' SVA patch:
git://linux-arm.org/linux-jpb.
quoted
Anyhow, I don't think a single instance of hardware should justify an
entire new subsystem. Subsystems are hard to make and without multiple
hardware examples there is no way to expect that it would cover any
future use cases.
Yes. That's our first expectation. We can keep it with our driver. But because
there is no user driver support for any accelerator in mainline kernel. Even the
well known QuickAssit has to be maintained out of tree. So we try to see if
people is interested in working together to solve the problem.
quoted
If all your driver needs is to mmap some PCI bar space, route
interrupts and do DMA mapping then mediated VFIO is probably a good
choice.
Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
try not to add complexity to the mm subsystem.
quoted
If it needs to do a bunch of other stuff, not related to PCI bar
space, interrupts and DMA mapping (ie special code for compression,
crypto, AI, whatever) then you should probably do what Jerome said and
make a drivers/char/hisillicon_foo_bar.c that exposes just what your
hardware does.
Yes. If no other accelerator driver writer is interested. That is the
expectation:)

But we really like to have a public solution here. Consider this scenario:

You create some connections (queues) to NIC, RSA, and AI engine. Then you got
data direct from the NIC and pass the pointer to RSA engine for decryption. The
CPU then finish some data taking or operation and then pass through to the AI
engine for CNN calculation....This will need a place to maintain the same
address space by some means.
You are using NIC terminology, in the documentation, you wrote that it is needed
for DPDK use and I don't really understand, why do we need another shiny new
interface for DPDK.
I'm not a DPDK expert. But we had some discussion with LNG of Linaro. They were
considering to create something similar to simplify the user driver. In most of
case, we use DPDK or ODP (open data plane) just for faster data plane data flow.
But many logic such as setting the hardware mode, mac address and so on is not
necessary. So they were looking for a way to keep the driver in the kernel and
just the ring buffer of some queues to the user space. This may simplified the
user space design.
quoted
It is not complex, but it is helpful.
quoted
If you have networking involved in here then consider RDMA,
particularly if this functionality is already part of the same
hardware that the hns infiniband driver is servicing.

'computational MRs' are a reasonable approach to a side-car offload of
already existing RDMA support.
OK. Thanks. I will spend some time on it. But personally, I really don't like
RDMA's complexity. I cannot even try one single function without a...some
expensive hardwares and complexity connection in the lab. This is not like a
open source way.
It is not very accurate. We have RXE driver which is virtual RDMA device
which is implemented purely in SW. It struggles from bad performance and
sporadic failures, but it is enough to try RDMA on your laptop in VM.
Woo. This will be helpful. Thank you very much.
Thanks
quoted
quoted
Jason


-- 
			-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the 
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help