Thread (89 messages) 89 messages, 4 authors, 2019-07-03

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

From: Christoph Hellwig <hch@lst.de>
Date: 2019-06-28 13:38:43
Also in: linux-nvme, linux-pci, linux-rdma, lkml

On Thu, Jun 27, 2019 at 12:00:35PM -0600, Logan Gunthorpe wrote:
quoted
It is not.  (c) is fundamentally very different as it is not actually
an operation that ever goes out to the wire at all, and which is why the
actual physical address on the wire does not matter at all.
Some interfaces like NVMe have designed it in a way that it the commands
used to do this internal transfer look like (b2), but that is just their
(IMHO very questionable) interface design choice, that produces a whole
chain of problems.
quoted
From the mapping device's driver's perspective yes, but from the
perspective of a submitting driver they would be the same.
With your dma_addr_t scheme it won't be the same, as you'd need
a magic way to generate the internal addressing and stuff it into
the dma_addr_t.  With a phys_addr_t based scheme they should basically
be all the same.
Yes, you did suggest them. But what I'm trying to suggest is we don't
*necessarily* need the lookup. For demonstration purposes only, a
submitting driver could very roughly potentially do:

struct bio_vec vec;
dist = pci_p2pdma_dist(provider_pdev, mapping_pdev);
if (dist < 0) {
     /* use regular memory */
     vec.bv_addr = virt_to_phys(kmalloc(...));
     vec.bv_flags = 0;
} else if (dist & PCI_P2PDMA_THRU_HOST_BRIDGE) {
     vec.bv_addr = pci_p2pmem_alloc_phys(provider_pdev, ...);
     vec.bv_flags = BVEC_MAP_RESOURCE;
} else {
     vec.bv_addr = pci_p2pmem_alloc_bus_addr(provider_pdev, ...);
     vec.bv_flags = BVEC_MAP_BUS_ADDR;
}
That doesn't look too bad, except..
-- And a mapping driver would roughly just do:

dma_addr_t dma_addr;
if (vec.bv_flags & BVEC_MAP_BUS_ADDR) {
     if (pci_bus_addr_in_bar(mapping_pdev, vec.bv_addr, &bar, &off))  {
          /* case (c) */
          /* program the DMA engine with bar and off */
Why bother with that here if we could also let the caller handle
that? pci_p2pdma_dist() should be able to trivially find that out
based on provider_dev == mapping_dev.
The real difficulty here is that you'd really want all the above handled
by a dma_map_bvec() so it can combine every vector hitting the IOMMU
into a single continuous IOVA -- but it's hard to fit case (c) into that
equation. So it might be that a dma_map_bvec() handles cases (a), (b1)
and (b2) and the mapping driver has to then check each resulting DMA
vector for pci_bus_addr_in_bar() while it is programming the DMA engine
to deal with case (c).
I'd do it the other way around.  pci_p2pdma_dist is used to find
the p2p type.  The p2p type is stuff into the bio_vec, and we then:

 (1) manually check for case (c) in driver for drivers that want to
     treat it different from (b)
 (2) we then have a dma mapping wrapper that checks the p2p type
     and does the right thing for the rest.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help