Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()
From: Christoph Hellwig <hch@lst.de>
Date: 2018-01-08 18:34:39
Also in:
linux-nvme, linux-pci, linux-rdma, lkml, nvdimm
On Mon, Jan 08, 2018 at 11:09:17AM -0700, Jason Gunthorpe wrote:
quoted
As usual we implement what actually has a consumer. On top of that the R/W API is the only core RDMA API that actually does DMA mapping for the ULP at the moment.Well again the same can be said for dma_map_page vs dma_map_sg...
I don't understand this comment.
quoted
For SENDs and everything else dma maps are done by the ULP (I'd like to eventually change that, though - e.g. sends through that are inline to the workqueue don't need a dma map to start with).quoted
That's because the initial design was to let the ULPs do the DMA mappings, which fundamentally is wrong. I've fixed it for the R/W API when adding it, but no one has started work on SENDs and atomics.Well, you know why it is like this, and it is very complicated to unwind - the HW driver does not have enough information during CQ processing to properly do any unmaps, let alone serious error tear down unmaps, so we'd need a bunch of new APIs developed first, like RW did. :\
Yes, if it was trivial we would have done it already.
quoted
quoted
And on that topic, does this scheme work with HFI?No, and I guess we need an opt-out. HFI generally seems to be extremely weird.This series needs some kind of fix so HFI, QIB, rxe, etc don't get broken, and it shouldn't be 'fixed' at the RDMA level.
I don't think rxe is a problem as it won't show up a pci device. HFI and QIB do show as PCI devices, and could be used for P2P transfers from the PCI point of view. It's just that they have a layer of software indirection between their hardware and what is exposed at the RDMA layer. So I very much disagree about where to place that workaround - the RDMA code is exactly the right place.
quoted
quoted
This is why P2P must fit in to the common DMA framework somehow, we rely on these abstractions to work properly and fully in RDMA.Moving P2P up to common RDMA code isn't going to fix this. For that we need to stop preting that something that isn't DMA can abuse the dma mapping framework, and until then opt them out of behavior that assumes actual DMA like P2P.It could, if we had a DMA op for p2p then the drivers that provide their own ops can implement it appropriately or not at all. Eg the correct implementation for rxe to support p2p memory is probably somewhat straightfoward.
But P2P is _not_ a factor of the dma_ops implementation at all, it is something that happens behind the dma_map implementation. Think about what the dma mapping routines do: (a) translate from host address to bus addresses and (b) flush caches (in non-coherent architectures) Both are obviously not needed for P2P transfers, as they never reach the host.
Very long term the IOMMUs under the ops will need to care about this, so the wrapper is not an optimal place to put it - but I wouldn't object if it gets it out of RDMA :)
Unless you have an IOMMU on your PCIe switch and not before/inside the root complex that is not correct.