Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range()... | linux-perf-users

[PATCH v4 00/30] context_tracking,x86: Defer some IPIs until a user->kernel transition · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 01/30] objtool: Make validate_call() recognize indirect calls to pv_ops[] · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 02/30] objtool: Flesh out warning related to pv_ops[] calls · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 03/30] rcu: Add a small-width RCU watching counter debug option · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
Re: [PATCH v4 03/30] rcu: Add a small-width RCU watching counter debug option · Frederic Weisbecker <frederic@kernel.org> · 2025-01-21
[PATCH v4 04/30] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
Re: [PATCH v4 04/30] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE · Frederic Weisbecker <frederic@kernel.org> · 2025-01-21
[PATCH v4 05/30] jump_label: Add annotations for validating noinstr usage · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 06/30] static_call: Add read-only-after-init static calls · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 07/30] x86/paravirt: Mark pv_sched_clock static call as __ro_after_init · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 08/30] x86/idle: Mark x86_idle static call as __ro_after_init · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 09/30] x86/paravirt: Mark pv_steal_clock static call as __ro_after_init · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 10/30] riscv/paravirt: Mark pv_steal_clock static call as __ro_after_init · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
Re: [PATCH v4 10/30] riscv/paravirt: Mark pv_steal_clock static call as __ro_after_init · Andrew Jones <hidden> · 2025-01-14
[PATCH v4 11/30] loongarch/paravirt: Mark pv_steal_clock static call as __ro_after_init · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 12/30] arm64/paravirt: Mark pv_steal_clock static call as __ro_after_init · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 13/30] arm/paravirt: Mark pv_steal_clock static call as __ro_after_init · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 14/30] perf/x86/amd: Mark perf_lopwr_cb static call as __ro_after_init · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 15/30] sched/clock: Mark sched_clock_running key as __ro_after_init · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 16/30] x86/speculation/mds: Mark mds_idle_clear key as allowed in .noinstr · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 17/30] sched/clock, x86: Mark __sched_clock_stable key as allowed in .noinstr · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 18/30] x86/kvm/vmx: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys as allowed in .noinstr · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
Re: [PATCH v4 18/30] x86/kvm/vmx: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys as allowed in .noinstr · Sean Christopherson <seanjc@google.com> · 2025-01-14
Re: [PATCH v4 18/30] x86/kvm/vmx: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys as allowed in .noinstr · Valentin Schneider <vschneid@redhat.com> · 2025-01-17
[PATCH v4 19/30] stackleack: Mark stack_erasing_bypass key as allowed in .noinstr · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 20/30] objtool: Add noinstr validation for static branches/calls · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 21/30] context_tracking: Explicitely use CT_STATE_KERNEL where it is missing · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
Re: [PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry · Frederic Weisbecker <frederic@kernel.org> · 2025-01-22
Re: [PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry · Sean Christopherson <seanjc@google.com> · 2025-01-22
Re: [PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry · Valentin Schneider <vschneid@redhat.com> · 2025-01-27
Re: [PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry · Valentin Schneider <vschneid@redhat.com> · 2025-02-07
Re: [PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry · Frederic Weisbecker <frederic@kernel.org> · 2025-02-07
Re: [PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry · Valentin Schneider <vschneid@redhat.com> · 2025-02-10
[PATCH v4 23/30] context_tracking: Turn CT_STATE_* into bits · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 24/30] context-tracking: Introduce work deferral infrastructure · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs · Sean Christopherson <seanjc@google.com> · 2025-01-14
Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs · Sean Christopherson <seanjc@google.com> · 2025-01-14
Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs · Valentin Schneider <vschneid@redhat.com> · 2025-01-17
Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs · Valentin Schneider <vschneid@redhat.com> · 2025-01-17
Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs · Sean Christopherson <seanjc@google.com> · 2025-01-17
Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs · Valentin Schneider <vschneid@redhat.com> · 2025-01-20
Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs · Sean Christopherson <seanjc@google.com> · 2025-01-14
Re: [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs · K Prateek Nayak <kprateek.nayak@amd.com> · 2025-01-24
[PATCH v4 26/30] x86,tlb: Make __flush_tlb_global() noinstr-compliant · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
Re: [PATCH v4 26/30] x86,tlb: Make __flush_tlb_global() noinstr-compliant · Dave Hansen <hidden> · 2025-01-14
Re: [PATCH v4 26/30] x86,tlb: Make __flush_tlb_global() noinstr-compliant · Valentin Schneider <vschneid@redhat.com> · 2025-01-17
[PATCH v4 27/30] x86/tlb: Make __flush_tlb_local() noinstr-compliant · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
Re: [PATCH v4 27/30] x86/tlb: Make __flush_tlb_local() noinstr-compliant · Sean Christopherson <seanjc@google.com> · 2025-01-14
[PATCH v4 28/30] x86/tlb: Make __flush_tlb_all() noinstr · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
[PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-01-14
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Jann Horn <jannh@google.com> · 2025-01-14
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-01-17
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Jann Horn <jannh@google.com> · 2025-01-17
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-01-17
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Joel Fernandes <joelagnelf@nvidia.com> · 2025-02-19
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-02-19
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Joel Fernandes <joelagnelf@nvidia.com> · 2025-02-19
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Dave Hansen <hidden> · 2025-02-19
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Will Deacon <will@kernel.org> · 2025-01-27
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-02-10
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Jann Horn <jannh@google.com> · 2025-02-10
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-02-11
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Mark Rutland <mark.rutland@arm.com> · 2025-02-11
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-02-11
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Dave Hansen <hidden> · 2025-02-11
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-02-11
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-02-18
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Dave Hansen <hidden> · 2025-02-19
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-02-19
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Dave Hansen <hidden> · 2025-02-19
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-02-20
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Dave Hansen <hidden> · 2025-02-20
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-02-26
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-03-25
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Jann Horn <jannh@google.com> · 2025-03-25
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-03-26
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Uladzislau Rezki <urezki@gmail.com> · 2025-01-17
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-01-17
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Uladzislau Rezki <urezki@gmail.com> · 2025-01-20
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-01-20
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Uladzislau Rezki <urezki@gmail.com> · 2025-01-21
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Valentin Schneider <vschneid@redhat.com> · 2025-01-24
Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs · Uladzislau Rezki <urezki@gmail.com> · 2025-01-27
[PATCH v4 30/30] context-tracking: Add a Kconfig to enable IPI deferral for NO_HZ_IDLE · Valentin Schneider <vschneid@redhat.com> · 2025-01-14

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

From: Joel Fernandes <joelagnelf@nvidia.com>
Date: 2025-02-19 15:05:52
Also in: bpf, kvm, linux-arch, linux-hardening, linux-kselftest, linux-mm, linux-riscv, lkml, loongarch, rcu, virtualization, xen-devel

On Fri, Jan 17, 2025 at 05:53:33PM +0100, Valentin Schneider wrote:

On 17/01/25 16:52, Jann Horn wrote:

quoted

On Fri, Jan 17, 2025 at 4:25 PM Valentin Schneider [off-list ref] wrote:

quoted

On 14/01/25 19:16, Jann Horn wrote:

quoted

On Tue, Jan 14, 2025 at 6:51 PM Valentin Schneider [off-list ref] wrote:

quoted

vunmap()'s issued from housekeeping CPUs are a relatively common source of
interference for isolated NOHZ_FULL CPUs, as they are hit by the
flush_tlb_kernel_range() IPIs.

Given that CPUs executing in userspace do not access data in the vmalloc
range, these IPIs could be deferred until their next kernel entry.

Deferral vs early entry danger zone
===================================

This requires a guarantee that nothing in the vmalloc range can be vunmap'd
and then accessed in early entry code.

In other words, it needs a guarantee that no vmalloc allocations that
have been created in the vmalloc region while the CPU was idle can
then be accessed during early entry, right?

I'm not sure if that would be a problem (not an mm expert, please do
correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't
deferred anyway.

flush_cache_vmap() is about stuff like flushing data caches on
architectures with virtually indexed caches; that doesn't do TLB
maintenance. When you look for its definition on x86 or arm64, you'll
see that they use the generic implementation which is simply an empty
inline function.

quoted

So after vmapping something, I wouldn't expect isolated CPUs to have
invalid TLB entries for the newly vmapped page.

However, upon vunmap'ing something, the TLB flush is deferred, and thus
stale TLB entries can and will remain on isolated CPUs, up until they
execute the deferred flush themselves (IOW for the entire duration of the
"danger zone").

Does that make sense?

The design idea wrt TLB flushes in the vmap code is that you don't do
TLB flushes when you unmap stuff or when you map stuff, because doing
TLB flushes across the entire system on every vmap/vunmap would be a
bit costly; instead you just do batched TLB flushes in between, in
__purge_vmap_area_lazy().

In other words, the basic idea is that you can keep calling vmap() and
vunmap() a bunch of times without ever doing TLB flushes until you run
out of virtual memory in the vmap region; then you do one big TLB
flush, and afterwards you can reuse the free virtual address space for
new allocations again.

So if you "defer" that batched TLB flush for CPUs that are not
currently running in the kernel, I think the consequence is that those
CPUs may end up with incoherent TLB state after a reallocation of the
virtual address space.

Ah, gotcha, thank you for laying this out! In which case yes, any vmalloc
that occurred while an isolated CPU was NOHZ-FULL can be an issue if said
CPU accesses it during early entry;

So the issue is:

CPU1: unmappes vmalloc page X which was previously mapped to physical page
P1.

CPU2: does a whole bunch of vmalloc and vfree eventually crossing some lazy
threshold and sending out IPIs. It then goes ahead and does an allocation
that maps the same virtual page X to physical page P2.

CPU3 is isolated and executes some early entry code before receving said IPIs
which are supposedly deferred by Valentin's patches.

It does not receive the IPI becuase it is deferred, thus access by early
entry code to page X on this CPU results in a UAF access to P1.

Is that the issue?

thanks,

 - Joel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help