Re: [RFC PATCH v4 13/16] powerpc/e500: Use contiguous PMD instead of hugepd
From: Oscar Salvador <hidden>
Date: 2024-05-29 08:49:25
Also in:
linux-mm, lkml
On Mon, May 27, 2024 at 03:30:11PM +0200, Christophe Leroy wrote:
e500 supports many page sizes among which the following size are implemented in the kernel at the time being: 4M, 16M, 64M, 256M, 1G. On e500, TLB miss for hugepages is exclusively handled by SW even on e6500 which has HW assistance for 4k pages, so there are no constraints like on the 8xx. On e500/32, all are at PGD/PMD level and can be handled as cont-PMD. On e500/64, smaller ones are on PMD while bigger ones are on PUD. Again, they can easily be handled as cont-PMD and cont-PUD instead of hugepd. Signed-off-by: Christophe Leroy <redacted>
...
quoted hunk ↗ jump to hunk
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h index 90d6a0943b35..f7421d1a1693 100644 --- a/arch/powerpc/include/asm/nohash/pgtable.h +++ b/arch/powerpc/include/asm/nohash/pgtable.h@@ -52,11 +52,36 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, p { pte_basic_t old = pte_val(*p); pte_basic_t new = (old & ~(pte_basic_t)clr) | set; + unsigned long sz; + unsigned long pdsize; + int i; if (new == old) return old; - *p = __pte(new); +#ifdef CONFIG_PPC_E500 + if (huge) + sz = 1UL << (((old & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 20); + else
I think this will not compile when CONFIG_PPC_85xx && !CONFIG_PTE_64BIT. You have declared _PAGE_HSIZE_MSK and _PAGE_HSIZE_SHIFT in arch/powerpc/include/asm/nohash/hugetlb-e500.h. But hugetlb-e500.h is only included if CONFIG_PPC_85xx && CONFIG_PTE_64BIT (see arch/powerpc/include/asm/nohash/32/pgtable.h).
+#endif
+ sz = PAGE_SIZE;
+
+ if (!huge || sz < PMD_SIZE)
+ pdsize = PAGE_SIZE;
+ else if (sz < PUD_SIZE)
+ pdsize = PMD_SIZE;
+ else if (sz < P4D_SIZE)
+ pdsize = PUD_SIZE;
+ else if (sz < PGDIR_SIZE)
+ pdsize = P4D_SIZE;
+ else
+ pdsize = PGDIR_SIZE;
+
+ for (i = 0; i < sz / pdsize; i++, p++) {
+ *p = __pte(new);
+ if (new)
+ new += (unsigned long long)(pdsize / PAGE_SIZE) << PTE_RPN_SHIFT;I guess 'new' can be 0 if pte_update() is called on behave of clearing the pte?
+static inline unsigned long pmd_leaf_size(pmd_t pmd)
+{
+ return 1UL << (((pmd_val(pmd) & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 20);Can we have the '20' somewhere defined with a comment on top explaining what is so it is not a magic number? Otherwise people might come look at this and wonder why 20.
quoted hunk ↗ jump to hunk
--- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c@@ -331,6 +331,37 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, __set_huge_pte_at(pmdp, ptep, pte_val(pte)); } } +#elif defined(CONFIG_PPC_E500) +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, + pte_t pte, unsigned long sz) +{ + unsigned long pdsize; + int i; + + pte = set_pte_filter(pte, addr); + + /* + * Make sure hardware valid bit is not set. We don't do + * tlb flush for this update. + */ + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); + + if (sz < PMD_SIZE) + pdsize = PAGE_SIZE; + else if (sz < PUD_SIZE) + pdsize = PMD_SIZE; + else if (sz < P4D_SIZE) + pdsize = PUD_SIZE; + else if (sz < PGDIR_SIZE) + pdsize = P4D_SIZE; + else + pdsize = PGDIR_SIZE; + + for (i = 0; i < sz / pdsize; i++, ptep++, addr += pdsize) { + __set_pte_at(mm, addr, ptep, pte, 0); + pte = __pte(pte_val(pte) + ((unsigned long long)pdsize / PAGE_SIZE << PFN_PTE_SHIFT));
You can use pte_advance_pfn() here? Just give have nr = (unsigned long long)pdsize / PAGE_SIZE << PFN_PTE_SHIFT) pte_advance_pfn(pte, nr) Which 'sz's can we have here? You mentioned that e500 support: 4M, 16M, 64M, 256M, 1G. which of these ones can be huge? -- Oscar Salvador SUSE Labs