Thread (10 messages) 10 messages, 3 authors, 2021-11-27

Re: [PATCH v4 for-next 1/1] RDMA/hns: Support direct wqe of userspace

From: Wenpeng Liang <hidden>
Date: 2021-11-26 08:27:32

On 2021/11/26 1:50, Jason Gunthorpe wrote:
On Mon, Nov 22, 2021 at 10:58:09AM +0200, Leon Romanovsky wrote:
quoted
On Mon, Nov 22, 2021 at 11:38:01AM +0800, Wenpeng Liang wrote:
quoted
From: Yixing Liu <redacted>

Add direct wqe enable switch and address mapping.

Signed-off-by: Yixing Liu <redacted>
Signed-off-by: Wenpeng Liang <redacted>
 drivers/infiniband/hw/hns/hns_roce_device.h |  8 +--
 drivers/infiniband/hw/hns/hns_roce_main.c   | 38 ++++++++++++---
 drivers/infiniband/hw/hns/hns_roce_pd.c     |  3 ++
 drivers/infiniband/hw/hns/hns_roce_qp.c     | 54 ++++++++++++++++++++-
 include/uapi/rdma/hns-abi.h                 |  2 +
 5 files changed, 94 insertions(+), 11 deletions(-)
<...>
quoted
 	entry = to_hns_mmap(rdma_entry);
 	pfn = entry->address >> PAGE_SHIFT;
-	prot = vma->vm_page_prot;
 
-	if (entry->mmap_type != HNS_ROCE_MMAP_TYPE_TPTR)
-		prot = pgprot_noncached(prot);
+	switch (entry->mmap_type) {
+	case HNS_ROCE_MMAP_TYPE_DB:
+		prot = pgprot_noncached(vma->vm_page_prot);
+		break;
+	case HNS_ROCE_MMAP_TYPE_TPTR:
+		prot = vma->vm_page_prot;
+		break;
+	case HNS_ROCE_MMAP_TYPE_DWQE:
+		prot = pgprot_device(vma->vm_page_prot);
Everything fine, except this pgprot_device(). You probably need to check
WC internally in your driver and use or pgprot_writecombine() or
pgprot_noncached() explicitly.
pgprot_device is only used in two places in the kernel
pci_mmap_resource_range() for setting up the sysfs resourceXX mmap

And in pci_remap_iospace() as part of emulationg PIO on mmio
architectures

So, a PCI device should always be using pgprot_device() in its mmap
function

The question is why is pgprot_noncached() being used at all? The only
difference on ARM is that noncached is non-Early Write Acknowledgement
and devices is not.

At the very least this should be explained in a comment why nE vs E is
required in all these cases.

Jason
.
HIP09 is a SoC device, and our CPU only optimizes ST4 instructions for device
attributes. Therefore, we set device attributes to obtain optimization effects.

The device attribute allows early ack, so it is faster compared with noncached.
In order to ensure the early ack works correctly. Even if the data is incomplete,
our device still knocks on the doorbell according to the content of the first
8 bytes to complete the data transmission.

Thanks
Wenpeng
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help