Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: 2020-07-16 15:34:55
Also in:
linux-arch, linux-mm, lkml
----- On Jul 16, 2020, at 7:00 AM, Peter Zijlstra peterz@infradead.org wrote:
On Thu, Jul 16, 2020 at 08:03:36PM +1000, Nicholas Piggin wrote:quoted
Excerpts from Peter Zijlstra's message of July 16, 2020 6:50 pm:quoted
On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote:quoted
quoted
On Jul 15, 2020, at 9:15 PM, Nicholas Piggin [off-list ref] wrote:quoted
quoted
quoted
But I’m wondering if all this deferred sync stuff is wrong. In the brave new world of io_uring and such, perhaps kernel access matter too. Heck, even:IIRC the membarrier SYNC_CORE use-case is about user-space self-modifying code. Userspace re-uses a text address and needs to SYNC_CORE before it can be sure the old text is forgotten. Nothing the kernel does matters there. I suppose the manpage could be more clear there.True, but memory ordering of kernel stores from kernel threads for regular mem barrier is the concern here. Does io_uring update completion queue from kernel thread or interrupt, for example? If it does, then membarrier will not order such stores with user memory accesses.So we're talking about regular membarrier() then? Not the SYNC_CORE variant per-se. Even there, I'll argue we don't care, but perhaps Mathieu has a different opinion.
I agree with Peter that we don't care about accesses to user-space memory performed concurrently with membarrier. What we'd care about in terms of accesses to user-space memory from the kernel is something that would be clearly ordered as happening before or after the membarrier call, for instance a read(2) followed by membarrier(2) after the read returns, or a read(2) issued after return from membarrier(2). The other scenario we'd care about is with the compiler barrier paired with membarrier: e.g. read(2) returns, compiler barrier, followed by a store. Or load, compiler barrier, followed by write(2). All those scenarios imply before/after ordering wrt either membarrier or the compiler barrier. I notice that io_uring has a "completion" queue. Let's try to come up with realistic usage scenarios. So the dependency chain would be provided by e.g.: * Infrequent read / Frequent write, communicating read completion through variable X wait for io_uring read request completion -> membarrier -> store X=1 with matching load from X (waiting for X==1) -> asm volatile (::: "memory") -> submit io_uring write request or this other scenario: * Frequent read / Infrequent write, communicating read completion through variable X load from X (waiting for X==1) -> membarrier -> submit io_uring write request with matching wait for io_uring read request completion -> asm volatile (::: "memory") -> store X=1 Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com