Thread (10 messages) 10 messages, 3 authors, 2020-04-01

Re: [RFC 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics

From: Catalin Marinas <catalin.marinas@arm.com>
Date: 2019-07-03 17:52:57
Also in: linux-mm, lkml

On Tue, Jul 02, 2019 at 09:07:28AM +0530, Anshuman Khandual wrote:
On 06/28/2019 03:50 PM, Catalin Marinas wrote:
quoted
On Thu, Jun 27, 2019 at 06:18:15PM +0530, Anshuman Khandual wrote:
quoted
pmd_present() and pmd_trans_huge() are expected to behave in the following
manner during various phases of a given PMD. It is derived from a previous
detailed discussion on this topic [1] and present THP documentation [2].

pmd_present(pmd):

- Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
- Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd)

pmd_trans_huge(pmd):

- Returns true if pmd refers to system RAM and is a trans huge mapping
[...]
quoted
Before we actually start fixing this, I would strongly suggest that you
add a boot selftest (see lib/Kconfig.debug for other similar cases)
which checks the consistency of the page table macros w.r.t. the
expected mm semantics. Once the mm maintainers agreed with the
semantics, it will really help architecture maintainers in implementing
them correctly.
Sure and it will help all architectures to be in sync wrt semantics.
quoted
You wouldn't need actual page tables, just things like assertions on
pmd_trans_huge(pmd_mkhuge(pmd)) == true. You could go further and have
checks on pmdp_invalidate(&dummy_vma, dummy_addr, &dummy_pmd) with the
dummy_* variables on the stack.
Hmm. I guess macros which operate directly on a page table entry will be
okay but the ones which check on specific states for VMA or MM might be
bit tricky. Try to emulate VMA/MM states while on stack ?. But sure, will
explore adding such a test.
You can pretend that the page table is on the stack. See the _pmd
variable in do_huge_pmd_wp_page_fallback() and
__split_huge_zero_page_pmd(). Similarly, the vma and even the mm can be
faked on the stack (see the arm64 tlb_flush()).
quoted
quoted
The problem:

PMD is first invalidated with pmdp_invalidate() before it's splitting. This
invalidation clears PMD_SECT_VALID as below.

PMD Split -> pmdp_invalidate() -> pmd_mknotpresent -> Clears PMD_SECT_VALID

Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
on the PMD entry.
I think that's an inconsistency in the expected semantics here. Do you
mean that pmd_present(pmd_mknotpresent(pmd)) should be true? If not, do
[...]
pmd_present() and pmd_mknotpresent() are not exact inverse.
I find this very confusing (not your fault, just the semantics expected
by the core code). I can see that x86 is using _PAGE_PSE to make
pmd_present(pmd_mknotpresent()) == true. However, for pud that's not the
case (because it's not used for transhuge).

I'd rather have this renamed to pmd_mknotvalid().
In absence of a positive section mapping bit on arm64, PTE_SPECIAL is being set
temporarily to remember that it was a mapped PMD which got invalidated recently
but which still points to memory. Hence pmd_present() must evaluate true.
I wonder if we can encode this safely for arm64 in the bottom two bits
of a pmd :

0b00 - not valid, not present
0b10 - not valid, present, huge
0b01 - valid, present, huge
0b11 - valid, table (not huge)

Do we ever call pmdp_invalidate() on a table entry? I don't think we do.

So a pte_mknotvalid would set bit 1 and I think swp_entry_to_pmd() would
have to clear it so that pmd_present() actually returns false for a swp
pmd entry.
quoted
we need to implement our own pmdp_invalidate() or change the generic one
to set a "special" bit instead of just a pmd_mknotpresent?
Though arm64 can subscribe __HAVE_ARCH_PMDP_INVALIDATE and implement it's own
pmdp_invalidate() in order to not call pmd_mknotpresent() and instead operate
on the invalid and special bits directly. But its not going to alter relevant
semantics here. AFAICS it might be bit better as it saves pmd_mknotpresent()
from putting in that special bit in there which it is not supposed do.

IFAICS there is no compelling reason for generic pmdp_invalidate() to change
either. It calls pmd_mknotpresent() which invalidates the entry through valid
or present bit and platforms which have dedicated huge page bit can still test
positive for pmd_present() after it's invalidation. It works for such platforms.
Platform specific override is required when invalidation via pmd_mknotpresent()
is not enough.
I'd really like the mknotpresent to be renamed to mknotvalid and then we
can keep pmdp_invalidate unchanged (well, calling mknotvalid instead).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help