Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()
From: Christophe Leroy <hidden>
Date: 2024-03-20 06:16:46
Also in:
linux-arm-kernel, linux-mm, lkml, sparclinux
Le 20/03/2024 à 00:26, Jason Gunthorpe a écrit :
On Tue, Mar 19, 2024 at 11:07:08PM +0000, Christophe Leroy wrote:quoted
Le 18/03/2024 à 17:15, Jason Gunthorpe a écrit :quoted
On Thu, Mar 14, 2024 at 01:11:59PM +0000, Christophe Leroy wrote:quoted
Le 14/03/2024 à 13:53, Peter Xu a écrit :quoted
On Thu, Mar 14, 2024 at 08:45:34AM +0000, Christophe Leroy wrote:quoted
Le 13/03/2024 à 22:47, peterx@redhat.com a écrit :quoted
From: Peter Xu <peterx@redhat.com> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so it will keep returning false. As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create such huge mappings for 4K hash MMUs. Meanwhile, the major powerpc hugetlb pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb mappings. The goal should be that we will have one API pXd_leaf() to detect all kinds of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.All kinds of huge mappings ? pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report those huge pages.Ah yes, I should always mention this is in the context of leaf huge pages only. Are the examples you provided all fall into hugepd category? If so I can reword the commit message, as:On powerpc 8xx, only the 8M huge pages fall into the hugepd case. The 512k hugepages are at PTE level, they are handled more or less like CONT_PTE on ARM. see function set_huge_pte_at() for more context. You can also look at pte_leaf_size() and pgd_leaf_size().IMHO leaf should return false if the thing is pointing to a next level page table, even if that next level is fully populated with contiguous pages. This seems more aligned with the contig page direction that hugepd should be moved over to..Should hugepd be moved to the contig page direction, really ?Sure? Is there any downside for the reading side to do so?
Probably not.
quoted
Would it be acceptable that a 8M hugepage requires 2048 contig entries in 2 page tables, when the hugepd allows a single entry ?? I thought we agreed the only difference would be that something new is needed to merge the two identical sibling page tables into one, ie you pay 2x the page table memory if that isn't fixed. That is write side only change and I imagine it could be done with a single PPC special API. Honestly not totally sure that is a big deal, it is already really memory inefficient compared to every other arch's huge page by needing the child page table in the first place.quoted
Would it be acceptable performancewise ?Isn't this particular PPC sub platform ancient? Are there current real users that are going to have hugetlbfs special code and care about this performance detail on a 6.20 era kernel?
Ancient yes but still widely in use and with the emergence of voice over IP in Air Trafic Control, performance becomes more and more challenge with those old boards that have another 10 years in front of them.
In today's world wouldn't it be performance better if these platforms could support THP by aligning to the contig API instead of being special?
Indeed, if we can promote THP that'd be even better.
Am I wrong to question why we are polluting the core code for this special optimization?
At the first place that was to get a close fit between hardware pagetable topology and linux pagetable topology. But obviously we already stepped back for 512k pages, so let's go one more step aside and do similar with 8M pages. I'll give it a try and see how it goes. Christophe