On Fri, May 30, 2025 at 6:45 PM Ryan Roberts [off-list ref] wrote:
On 30/05/2025 17:26, Jann Horn wrote:
quoted
On Fri, May 30, 2025 at 4:04 PM Ryan Roberts [off-list ref] wrote:
quoted
pagemap_scan_pmd_entry() was previously modifying ptes while in lazy mmu
mode, then performing tlb maintenance for the modified ptes, then
leaving lazy mmu mode. But any pte modifications during lazy mmu mode
may be deferred until arch_leave_lazy_mmu_mode(), inverting the required
ordering between pte modificaiton and tlb maintenance.
Let's fix that by leaving mmu mode, forcing all the pte updates to be
actioned, before doing the tlb maintenance.
This is a theorectical bug discovered during code review.
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Hmm... isn't lazy mmu mode supposed to also delay TLB flushes, and
preserve the ordering of PTE modifications and TLB flushes?
Looking at the existing implementations of lazy MMU:
- In Xen PV implementation of lazy MMU, I see that TLB flush
hypercalls are delayed as well (xen_flush_tlb(),
xen_flush_tlb_one_user() and xen_flush_tlb_multi() all use
xen_mc_issue(XEN_LAZY_MMU) which delays issuing if lazymmu is active).
- The sparc version also seems to delay TLB flushes, and sparc's
arch_leave_lazy_mmu_mode() seems to do TLB flushes via
flush_tlb_pending() if necessary.
- powerpc's arch_leave_lazy_mmu_mode() also seems to do TLB flushes.
Am I missing something?
I doubt it. I suspect this was just my misunderstanding then. I hadn't
appreciated that lazy mmu is also guarranteed to maintain flush ordering; it's
chronically under-documented. Sorry for the noise here. On that basis, I expect
the first 2 patches can definitely be dropped.
Yeah looking at this code I agree that it could use significantly more
verbose comments on the API contract.