Thread (36 messages) 36 messages, 4 authors, 2024-02-22

Re: [PATCH v2 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2024-01-16 12:31:44
Also in: linux-arm-kernel, linux-mm, linux-riscv, lkml

On Tue, Jan 16, 2024 at 06:30:39AM +0000, Christophe Leroy wrote:

Le 15/01/2024 à 19:37, Jason Gunthorpe a écrit :
quoted
On Wed, Jan 03, 2024 at 05:14:16PM +0800, peterx@redhat.com wrote:
quoted
From: Peter Xu <peterx@redhat.com>

Hugepd format for GUP is only used in PowerPC with hugetlbfs.  There are
some kernel usage of hugepd (can refer to hugepd_populate_kernel() for
PPC_8XX), however those pages are not candidates for GUP.

Commit a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to
file-backed mappings") added a check to fail gup-fast if there's potential
risk of violating GUP over writeback file systems.  That should never apply
to hugepd.  Considering that hugepd is an old format (and even
software-only), there's no plan to extend hugepd into other file typed
memories that is prone to the same issue.
I didn't dig into the ppc stuff too deeply, but this looks to me like
it is the same thing as ARM's contig bits?

ie a chunk of PMD/etc entries are all managed together as though they
are a virtual larger entry and we use the hugepte_addr_end() stuff to
iterate over each sub entry.
As far as I understand ARM's contig stuff, hugepd on powerpc is 
something different.

hugepd is a page directory dedicated to huge pages, where you have huge 
pages listed instead of regular pages. For instance, on powerpc 32 with 
each PGD entries covering 4Mbytes, a regular page table has 1024 PTEs. A 
hugepd for 512k is a page table with 8 entries.

And for 8Mbytes entries, the hugepd is a page table with only one entry. 
And 2 consecutive PGS entries will point to the same hugepd to cover the 
entire 8Mbytes.
That still sounds alot like the ARM thing - except ARM replicates the
entry, you also said PPC relicates the entry like ARM to get to the
8M?

I guess the difference is in how the table memory is layed out? ARM
marks the size in the same entry that has the physical address so the
entries are self describing and then replicated. It kind of sounds
like PPC is marking the size in prior level and then reconfiguring the
layout of the lower level? Otherwise it surely must do the same
replication to make a radix index work..

If yes, I guess that is the main problem, the mm APIs don't have way
today to convey data from the pgd level to understand how to parse the
pmd level?
quoted
It seems to me we should see ARM and PPC agree on what the API is for
this and then get rid of hugepd by making both use the same page table
walker API. Is that too hopeful?
Can't see the similarity between ARM contig PTE and PPC huge page 
directories.
Well, they are both variable sized entries.

So if you imagine a pmd_leaf(), pmd_leaf_size() and a pte_leaf_size()
that would return enough information for both.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help