Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than... | linuxppc-dev

[RFC PATCH 0/7] mmu context cleanup, lazy tlb cleanup, · Nicholas Piggin <npiggin@gmail.com> · 2020-07-10
[RFC PATCH 1/7] asm-generic: add generic MMU versions of mmu context functions · Nicholas Piggin <npiggin@gmail.com> · 2020-07-10
[RFC PATCH 2/7] arch: use asm-generic mmu context for no-op implementations · Nicholas Piggin <npiggin@gmail.com> · 2020-07-10
[RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-10
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Peter Zijlstra <peterz@infradead.org> · 2020-07-10
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-10
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Andy Lutomirski <luto@kernel.org> · 2020-07-10
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-13
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-13
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-13
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Andy Lutomirski <luto@kernel.org> · 2020-07-13
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-13
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Alan Stern <stern@rowland.harvard.edu> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-17
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Alan Stern <stern@rowland.harvard.edu> · 2020-07-17
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-17
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Alan Stern <stern@rowland.harvard.edu> · 2020-07-17
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-17
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Alan Stern <stern@rowland.harvard.edu> · 2020-07-17
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-17
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-17
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Andy Lutomirski <luto@amacapital.net> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Peter Zijlstra <peterz@infradead.org> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · peterz@infradead.org · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-16
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-17
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-20
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-20
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-21
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-21
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Nicholas Piggin <npiggin@gmail.com> · 2020-07-21
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · peterz@infradead.org · 2020-07-21
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-21
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Peter Zijlstra <peterz@infradead.org> · 2020-07-21
Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2020-07-21
[RFC PATCH 5/7] lazy tlb: introduce lazy mm refcount helper functions · Nicholas Piggin <npiggin@gmail.com> · 2020-07-10
Re: [RFC PATCH 5/7] lazy tlb: introduce lazy mm refcount helper functions · Peter Zijlstra <peterz@infradead.org> · 2020-07-10
[RFC PATCH 6/7] lazy tlb: allow lazy tlb mm switching to be configurable · Nicholas Piggin <npiggin@gmail.com> · 2020-07-10
[RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Nicholas Piggin <npiggin@gmail.com> · 2020-07-10
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Peter Zijlstra <peterz@infradead.org> · 2020-07-10
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Nicholas Piggin <npiggin@gmail.com> · 2020-07-13
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Andy Lutomirski <luto@kernel.org> · 2020-07-13
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Nicholas Piggin <npiggin@gmail.com> · 2020-07-13
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Andy Lutomirski <luto@amacapital.net> · 2020-07-13
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Nicholas Piggin <npiggin@gmail.com> · 2020-07-14
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Nicholas Piggin <npiggin@gmail.com> · 2020-07-14
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Andy Lutomirski <luto@amacapital.net> · 2020-07-14
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Peter Zijlstra <peterz@infradead.org> · 2020-07-14
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Nicholas Piggin <npiggin@gmail.com> · 2020-07-16
Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option · Nicholas Piggin <npiggin@gmail.com> · 2020-07-16
[RFC PATCH 3/7] mm: introduce exit_lazy_tlb · Nicholas Piggin <npiggin@gmail.com> · 2020-07-10

Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

From: Nicholas Piggin <npiggin@gmail.com>
Date: 2020-07-16 06:06:19
Also in: linux-arch, linux-mm, lkml

Excerpts from Andy Lutomirski's message of July 16, 2020 3:18 pm:

quoted

On Jul 15, 2020, at 9:15 PM, Nicholas Piggin [off-list ref] wrote:

Excerpts from Mathieu Desnoyers's message of July 14, 2020 12:13 am:

quoted

----- On Jul 13, 2020, at 9:47 AM, Nicholas Piggin npiggin@gmail.com wrote:

quoted

Excerpts from Nicholas Piggin's message of July 13, 2020 2:45 pm:

quoted

Excerpts from Andy Lutomirski's message of July 11, 2020 3:04 am:

quoted

Also, as it stands, I can easily see in_irq() ceasing to promise to
serialize.  There are older kernels for which it does not promise to
serialize.  And I have plans to make it stop serializing in the
nearish future.

You mean x86's return from interrupt? Sounds fun... you'll konw where to
update the membarrier sync code, at least :)

Oh, I should actually say Mathieu recently clarified a return from
interrupt doesn't fundamentally need to serialize in order to support
membarrier sync core.

Clarification to your statement:

Return from interrupt to kernel code does not need to be context serializing
as long as kernel serializes before returning to user-space.

However, return from interrupt to user-space needs to be context serializing.

Hmm, I'm not sure it's enough even with the sync in the exit_lazy_tlb
in the right places.

A kernel thread does a use_mm, then it blocks and the user process with
the same mm runs on that CPU, and then it calls into the kernel, blocks,
the kernel thread runs again, another CPU issues a membarrier which does
not IPI this one because it's running a kthread, and then the kthread
switches back to the user process (still without having unused the mm),
and then the user process returns from syscall without having done a 
core synchronising instruction.

The cause of the problem is you want to avoid IPI'ing kthreads. Why?
I'm guessing it really only matters as an optimisation in case of idle
threads. Idle thread is easy (well, easier) because it won't use_mm, so 
you could check for rq->curr == rq->idle in your loop (in a suitable 
sched accessor function).

But... I'm not really liking this subtlety in the scheduler for all this 
(the scheduler still needs the barriers when switching out of idle).

Can it be improved somehow? Let me forget x86 core sync problem for now
(that _may_ be a bit harder), and step back and look at what we're doing.
The memory barrier case would actually suffer from the same problem as
core sync, because in the same situation it has no implicit mmdrop in
the scheduler switch code either.

So what are we doing with membarrier? We want any activity caused by the 
set of CPUs/threads specified that can be observed by this thread before 
calling membarrier is appropriately fenced from activity that can be 
observed to happen after the call returns.

CPU0                     CPU1
                        1. user stuff
a. membarrier()          2. enter kernel
b. read rq->curr         3. rq->curr switched to kthread
c. is kthread, skip IPI  4. switch_to kthread
d. return to user        5. rq->curr switched to user thread
                6. switch_to user thread
                7. exit kernel
                        8. more user stuff

As far as I can see, the problem is CPU1 might reorder step 5 and step
8, so you have mmdrop of lazy mm be a mb after step 6.

But why? The membarrier call only cares that there is a full barrier
between 1 and 8, right? Which it will get from the previous context
switch to the kthread.

I must say the memory barrier comments in membarrier could be improved
a bit (unless I'm missing where the main comment is). It's fine to know
what barriers pair with one another, but we need to know which exact
memory accesses it is ordering

      /*
        * Matches memory barriers around rq->curr modification in
        * scheduler.
        */

Sure, but it doesn't say what else is being ordered. I think it's just
the user memory accesses, but would be nice to make that a bit more
explicit. If we had such comments then we might know this case is safe.

I think the funny powerpc barrier is a similar case of this. If we
ever see remote_rq->curr->flags & PF_KTHREAD, then we _know_ that
CPU has or will have issued a memory barrier between running user
code.

So AFAIKS all this membarrier stuff in kernel/sched/core.c could
just go away. Except x86 because thread switch doesn't imply core
sync, so CPU1 between 1 and 8 may never issue a core sync instruction
the same way a context switch must be a full mb.

Before getting to x86 -- Am I right, or way off track here?

I find it hard to believe that this is x86 only. Why would thread switch imply core sync on any architecture?  Is x86 unique in having a stupid expensive core sync that is heavier than smp_mb()?

It's not the thread switch but the return from kernel to user -- at 
least of architectures that implement membarrier SYNC_CORE, x86 can do 
that without serializing.

The thread switch is muddying the waters a bit, it's not the actual 
thread switch we care about, that just happens to be used as a point
where we try to catch the membarrier IPIs that were skipped due to the
PF_KTHREAD optimisation.

I think that doing said check in the lazy tlb exit code is both
unnecessary for the memory ordering and insufficient for pipeline 
serialization.

But I’m wondering if all this deferred sync stuff is wrong. In the brave new world of io_uring and such, perhaps kernel access matter too.  Heck, even:

int a[2];

Thread A:
a[0] = 1;
a[1] = 2:

Thread B:

write(fd, a, sizeof(a));

Doesn’t do what thread A is expecting.  Admittedly this particular example is nonsense, but maybe there are sensible cases that matter to someone.

I think kernel accesses probably do matter (or at least they should by 
principle of least surprise). And so I was doubly misleading by labeling
it as "user stuff". I should have distinguished between previous user or
kernel accesses, as opposed to the kernel accesses specifically for the
implementation of the membarrier call.

So I think the membarrier code gets *that* part right (modulo what we 
have seen already) if the kernel access is being done from process
context.

But yes if the access is coming from io_uring that has done
kthread_use_mm or some other random code running in a kernel thread
working on get_user_pages memory or any similar shared vm_insert_pfn
memory, then it goes completely to hell.

So good catch, PF_KTHREAD check is problematic there even if no actual
users exist today. rq->curr == rq->idle test might be better, but can
we have interrupts writing completions into user memory? For performance
I would hope so, so that makes even that test problematic.

Maybe membarrier should close that gap entirely, and work around performance
issue by adding _USER_ONLY flags which explicitly only order user mode
accesess vs other user accesses.

Thanks,
Nick

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help