Thread (78 messages) 78 messages, 11 authors, 2025-07-02

Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings

From: Peter Xu <peterx@redhat.com>
Date: 2025-06-24 20:53:13
Also in: linux-mm, lkml

On Tue, Jun 24, 2025 at 04:37:26PM -0400, Peter Xu wrote:
On Thu, Jun 19, 2025 at 03:40:41PM -0300, Jason Gunthorpe wrote:
quoted
Even with this new version you have to decide to return PUD_SIZE or
bar_size in pci and your same reasoning that PUD_SIZE make sense
applies (though I would probably return bar_size and just let the core
code cap it to PUD_SIZE)
Yes.

Today I went back to look at this, I was trying to introduce this for
file_operations:

	int (*get_mapping_order)(struct file *, unsigned long, size_t);

It looks almost good, except that it so far has no way to return the
physical address for further calculation on the alignment.

For THP, VA is always calculated against pgoff not physical address on the
alignment.  I think it's OK for THP, because every 2M THP folio will be
naturally 2M aligned on the physical address, so it fits when e.g. pgoff=0
in the calculation of thp_get_unmapped_area_vmflags().

Logically it should even also work for vfio-pci, as long as VFIO keeps
using the lower 40 bits of the device_fd to represent the bar offset,
meanwhile it'll also require PCIe spec asking the PCI bars to be mapped
aligned with bar sizes.

But from an API POV, get_mapping_order() logically should return something
for further calculation of the alignment to get the VA.  pgoff here may not
always be the right thing to use to align to the VA: after all, pgtable
mapping is about VA -> PA, the only reasonable and reliable way is to align
VA to the PA to be mappped, and as an API we shouldn't assume pgoff is
always aligned to PA address space.

Any thoughts?
I should have listed current viable next steps..  We have at least these
options:

(a) Ignore this issue, keep the get_mapping_order() interface like above,
    as long as it works for vfio-pci

    I don't like this option.  I prefer the API (if we're going to
    introduce one) to be applicable no matter how pgoff would be mapped to
    PAs.  I don't like the API to rely on specific driver on specific spec
    (in this case, PCI).

(b) I can make the new API like this instead:

    int (*get_mapping_order)(struct file *, unsigned long, unsigned long *, size_t);

    where I can return a *phys_pgoff altogether after the call returned the
    order to map in retval.  But that's very not pretty if not ugly.

(c) Go back to what I did with the current v1, addressing comments and keep
    using get_unmapped_area() until we know a better way.

I'll vote for (c), but I'm open to suggestions.

Thanks,

-- 
Peter Xu
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help