Re: [PATCH rdma-core] irdma: Restore full memory barrier for doorbell optimization

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2021-08-19 22:44:12

On Thu, Aug 19, 2021 at 10:01:50PM +0000, Nikolova, Tatyana E wrote:

quoted

From: Jason Gunthorpe <jgg@nvidia.com>
Sent: Wednesday, August 18, 2021 11:50 AM
To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org
Subject: Re: [PATCH rdma-core] irdma: Restore full memory barrier for
doorbell optimization

On Fri, Aug 13, 2021 at 05:25:49PM -0500, Tatyana Nikolova wrote:

quoted

1.	Software writing the valid bit in the WQE.
2.	Software reading shadow memory (hw_tail) value.

quoted

You are missing an ordered atomic on this read it looks like

Hi Jason,

Why do you think we need atomic ops in this case? We aren't trying to
protect from multiple threads but CPU re-ordering of a write and a
read.

Which is what the atomics will do.

Barriers are only appropriate when you can't add atomic markers to the
actual data that needs ordering.

Hi Jason,

We aren't sure what you mean by atomic markers. We ran a few
experiments with atomics, but none of the barriers we tried
smp_mb__{before,after}_atomic(), smp_load_acquire() and
smp_store_release() translates to a full memory barrier on X86.

Huh? Those are kernel primitives, this is a userspace patch.

Userspace follows the C11 atomics memory model.

So I'd expect 

  atomic_store_explicit(tail, memory_order_release)
  atomic_load_explicit(tail, memory_order_acquire)

To be the atomics you need. This will ensure that the read/writes to
valid before the atomics are sequenced correctly, eg no CPU thread can
observe an updated tail without also observing the set valid.

Jason

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help