Thread (59 messages) 59 messages, 6 authors, 2020-07-21

Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: 2020-07-16 15:34:55
Also in: linux-arch, linux-mm, lkml

----- On Jul 16, 2020, at 7:00 AM, Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jul 16, 2020 at 08:03:36PM +1000, Nicholas Piggin wrote:
quoted
Excerpts from Peter Zijlstra's message of July 16, 2020 6:50 pm:
quoted
On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote:
quoted
quoted
On Jul 15, 2020, at 9:15 PM, Nicholas Piggin [off-list ref] wrote:
quoted
quoted
quoted
But I’m wondering if all this deferred sync stuff is wrong. In the
brave new world of io_uring and such, perhaps kernel access matter
too.  Heck, even:
IIRC the membarrier SYNC_CORE use-case is about user-space
self-modifying code.

Userspace re-uses a text address and needs to SYNC_CORE before it can be
sure the old text is forgotten. Nothing the kernel does matters there.

I suppose the manpage could be more clear there.
True, but memory ordering of kernel stores from kernel threads for
regular mem barrier is the concern here.

Does io_uring update completion queue from kernel thread or interrupt,
for example? If it does, then membarrier will not order such stores
with user memory accesses.
So we're talking about regular membarrier() then? Not the SYNC_CORE
variant per-se.

Even there, I'll argue we don't care, but perhaps Mathieu has a
different opinion.
I agree with Peter that we don't care about accesses to user-space
memory performed concurrently with membarrier.

What we'd care about in terms of accesses to user-space memory from the
kernel is something that would be clearly ordered as happening before
or after the membarrier call, for instance a read(2) followed by
membarrier(2) after the read returns, or a read(2) issued after return
from membarrier(2). The other scenario we'd care about is with the compiler
barrier paired with membarrier: e.g. read(2) returns, compiler barrier,
followed by a store. Or load, compiler barrier, followed by write(2).

All those scenarios imply before/after ordering wrt either membarrier or
the compiler barrier. I notice that io_uring has a "completion" queue.
Let's try to come up with realistic usage scenarios.

So the dependency chain would be provided by e.g.:

* Infrequent read / Frequent write, communicating read completion through variable X

wait for io_uring read request completion -> membarrier -> store X=1

with matching

load from X (waiting for X==1) -> asm volatile (::: "memory") -> submit io_uring write request

or this other scenario:

* Frequent read / Infrequent write, communicating read completion through variable X

load from X (waiting for X==1) -> membarrier -> submit io_uring write request

with matching

wait for io_uring read request completion -> asm volatile (::: "memory") -> store X=1

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help