Re: [RFC PATCH 14/15] PCI/P2PDMA: Introduce pci_mmap_p2pmem()
From: Logan Gunthorpe <logang@deltatee.com>
Date: 2020-11-06 20:03:40
Also in:
linux-iommu, linux-mm, linux-nvme, linux-pci, lkml
On 2020-11-06 12:53 p.m., Jason Gunthorpe wrote:
On Fri, Nov 06, 2020 at 12:44:59PM -0700, Logan Gunthorpe wrote:quoted
On 2020-11-06 12:30 p.m., Jason Gunthorpe wrote:quoted
quoted
I certainly can't make decisions for code that isn't currently upstream.The rdma drivers are all upstream, what are you thinking about?Really? I feel like you should know what I mean here... I mean upstream code that actually uses the APIs that I'd have to introduce. I can't say here's an API feature that no code uses but the already upstream rdma driver might use eventually. It's fairly easy to send patches that make the necessary changes when someone adds a use of those changes inside the rdma code.Sure, but that doesn't mean you have to actively design things to be unusable beyond this narrow case. The RDMA drivers are all there, all upstream, if this work is accepted then the changes to insert P2P pages into their existing mmap flows is a reasonable usecase to consider at this point when building core code APIs. You shouldn't add dead code, but at least have a design in mind for what it needs to look like and some allowance.
I don't see anything I've done that's at odds with that. You will still need to make changes to the p2pdma code to implement your use case.
quoted
quoted
quoted
Ultimately, if you aren't using the genpool you will have to implement your own mmap operation that somehow allocates the pages and your own page_free hook.Sure, the mlx5 driver already has a specialized alloctor for it's BAR pages.So it *might* make sense to carve out a common helper to setup a VMA for P2PDMA to do the vm_flags check and set VM_MIXEDMAP... but besides that, there's no code that would be common to the two cases.I think the whole insertion of P2PDMA pages into a VMA should be similar to io_remap_pfn() so all the VM flags, pgprot_decrypted and other subtle details are all in one place. (also it needs a pgprot_decrypted before doing vmf_insert, I just learned that this month)
I don't think a function like that will work for the p2pmem use case. In
order to implement proper page freeing I expect I'll need a loop around
the allocator and vm_insert_mixed()... Something roughly like:
for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
vaddr = pci_alloc_p2pmem(pdev, PAGE_SIZE);
ret = vmf_insert_mixed(vma, addr,
__pfn_to_pfn_t(virt_to_pfn(vaddr), PFN_DEV | PFN_MAP));
}
That way we can call pci_free_p2pmem() when a page's ref count goes to
zero. I suspect your use case will need to do something similar.
Logan