Re: [PATCH rdma-core] irdma: Restore full memory barrier for doorbell optimization
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2021-08-19 22:44:12
On Thu, Aug 19, 2021 at 10:01:50PM +0000, Nikolova, Tatyana E wrote:
quoted
From: Jason Gunthorpe <jgg@nvidia.com> Sent: Wednesday, August 18, 2021 11:50 AM To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com> Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org Subject: Re: [PATCH rdma-core] irdma: Restore full memory barrier for doorbell optimization On Fri, Aug 13, 2021 at 05:25:49PM -0500, Tatyana Nikolova wrote:quoted
quoted
quoted
1. Software writing the valid bit in the WQE. 2. Software reading shadow memory (hw_tail) value.quoted
You are missing an ordered atomic on this read it looks likeHi Jason, Why do you think we need atomic ops in this case? We aren't trying to protect from multiple threads but CPU re-ordering of a write and a read.Which is what the atomics will do. Barriers are only appropriate when you can't add atomic markers to the actual data that needs ordering.Hi Jason, We aren't sure what you mean by atomic markers. We ran a few experiments with atomics, but none of the barriers we tried smp_mb__{before,after}_atomic(), smp_load_acquire() and smp_store_release() translates to a full memory barrier on X86.
Huh? Those are kernel primitives, this is a userspace patch. Userspace follows the C11 atomics memory model. So I'd expect atomic_store_explicit(tail, memory_order_release) atomic_load_explicit(tail, memory_order_acquire) To be the atomics you need. This will ensure that the read/writes to valid before the atomics are sequenced correctly, eg no CPU thread can observe an updated tail without also observing the set valid. Jason