Thread (23 messages) 23 messages, 4 authors, 2023-07-16

Re: [PATCH v1 11/14] arm64/mm: Wire up PTE_CONT for user mappings

From: Ryan Roberts <ryan.roberts@arm.com>
Date: 2023-07-05 13:13:42
Also in: linux-mm, lkml

On 04/07/2023 12:09, Ryan Roberts wrote:
On 03/07/2023 16:17, Catalin Marinas wrote:
quoted
Hi Ryan,
...
quoted
quoted
+
+int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
+					unsigned long addr, pte_t *ptep,
+					pte_t entry, int dirty)
+{
+	pte_t orig_pte;
+	int i;
+
+	/*
+	 * Gather the access/dirty bits for the contiguous range. If nothing has
+	 * changed, its a noop.
+	 */
+	orig_pte = ptep_get(ptep);
+	if (pte_val(orig_pte) == pte_val(entry))
+		return 0;
+
+	/*
+	 * We can fix up access/dirty bits without having to unfold/fold the
+	 * contig range. But if the write bit is changing, we need to go through
+	 * the full unfold/fold cycle.
+	 */
+	if (pte_write(orig_pte) == pte_write(entry)) {
Depending on the architecture version, pte_write() either checks a
software only bit or it checks the DBM one.
quoted
+		/*
+		 * No need to flush here; This is always "more permissive" so we
+		 * can only be _adding_ the access or dirty bit. And since the
+		 * tlb can't cache an entry without the AF set and the dirty bit
+		 * is a SW bit, there can be no confusion. For HW access
+		 * management, we technically only need to update the flag on a
+		 * single pte in the range. But for SW access management, we
+		 * need to update all the ptes to prevent extra faults.
+		 */
On pre-DBM hardware, a PTE_RDONLY entry (writable from the kernel
perspective but clean) may be cached in the TLB and we do need flushing.
I don't follow; The Arm ARM says:

  IPNQBP When an Access flag fault is generated, the translation table entry
         causing the fault is not cached in a TLB.

So the entry can only be in the TLB if AF is already 1. And given the dirty bit
is SW, it shouldn't affect the TLB state. And this function promises to only
change the bits so they are more permissive (so AF=0 -> AF=1, D=0 -> D=1).

So I'm not sure what case you are describing here?
Ahh sorry, I get your point now - on pre-DBM hardware, the HW sees a read-only
PTE when the kernel considers it clean and this can be in the TLB. Then when
making it dirty (from kernel's perspective), we are removing the read-only
protection from the HW perspective, so we need to flush the TLB entry.
quoted
quoted
+		ptep = contpte_align_down(ptep);
+		addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
+
+		for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE)
+			__ptep_set_access_flags(vma, addr, ptep, entry, 0);
Fixed by adding this after iterating though the ptes, intent it to avoid the
per-page tlb flash and instead flush the whole range at the end:

		if (dirty)
			__flush_tlb_range(vma, start_addr, addr,
							PAGE_SIZE, true, 3);
quoted
quoted
+	} else {
+		/*
+		 * No need to flush in __ptep_set_access_flags() because we just
+		 * flushed the whole range in __contpte_try_unfold().
+		 */
+		__contpte_try_unfold(vma->vm_mm, addr, ptep, orig_pte);
+		__ptep_set_access_flags(vma, addr, ptep, entry, 0);
I also think this is wrong; we must pass `dirty` as the last parameter so that
__ptep_set_access_flags will flush if neccessary. My comment about having just
done the flush is incorrect - we have just done a flush, but the ptes are still
valid with their old value so the HW could pull this into the TLB before we
modify the value.
quoted
quoted
+		contpte_try_fold(vma->vm_mm, addr, ptep, entry);
+	}
+
+	return 1;
+}
Thanks,
Ryan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help