Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and... | linux-perf-users

[PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Leo Yan <hidden> · 2021-05-19
[PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Leo Yan <hidden> · 2021-05-19
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Leo Yan <hidden> · 2021-05-31
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Peter Zijlstra <peterz@infradead.org> · 2021-05-31
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Adrian Hunter <adrian.hunter@intel.com> · 2021-05-31
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Leo Yan <hidden> · 2021-06-01
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Peter Zijlstra <peterz@infradead.org> · 2021-06-01
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Adrian Hunter <adrian.hunter@intel.com> · 2021-06-01
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Peter Zijlstra <peterz@infradead.org> · 2021-06-01
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Adrian Hunter <adrian.hunter@intel.com> · 2021-06-01
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Peter Zijlstra <peterz@infradead.org> · 2021-06-01
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Leo Yan <hidden> · 2021-06-01
Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release · Peter Zijlstra <peterz@infradead.org> · 2021-06-01
Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Adrian Hunter <adrian.hunter@intel.com> · 2021-05-27
Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Peter Zijlstra <peterz@infradead.org> · 2021-05-27
Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Adrian Hunter <adrian.hunter@intel.com> · 2021-05-27
Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Adrian Hunter <adrian.hunter@intel.com> · 2021-05-27
Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Peter Zijlstra <peterz@infradead.org> · 2021-05-27
Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Leo Yan <hidden> · 2021-05-31
Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Peter Zijlstra <peterz@infradead.org> · 2021-05-31
Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Leo Yan <hidden> · 2021-06-01
Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers · Peter Zijlstra <peterz@infradead.org> · 2021-05-27

Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release

From: Peter Zijlstra <peterz@infradead.org>
Date: 2021-06-01 06:48:51
Also in: lkml

On Mon, May 31, 2021 at 10:03:33PM +0300, Adrian Hunter wrote:

On 31/05/21 6:10 pm, Leo Yan wrote:

quoted

Hi Peter, Adrian,

On Wed, May 19, 2021 at 10:03:19PM +0800, Leo Yan wrote:

quoted

Load-acquire and store-release are one-way permeable barriers, which can
be used to guarantee the memory ordering between accessing the buffer
data and the buffer's head / tail.

This patch optimizes the memory ordering with the load-acquire and
store-release barriers.

Is this patch okay for you?

Besides this patch, I have an extra question.  You could see for
accessing the AUX buffer's head and tail, it also support to use
compiler build-in functions for atomicity accessing:

  __sync_val_compare_and_swap()
  __sync_bool_compare_and_swap()

Since now we have READ_ONCE()/WRITE_ONCE(), do you think we still need
to support __sync_xxx_compare_and_swap() atomicity?

I don't remember, but it seems to me atomicity is needed only
for a 32-bit perf running with a 64-bit kernel.

But that:

	do {
		old_tail = __sync_val_compare_and_swap(&pc->aux_tail, 0, 0);
	} while (!__sync_bool_compare_and_swap(&pc->aux_tail, old_tail, tail));

doesn't want to make sense to me. It appears to do a cmpxchg with 0
values to load old_tail, and then a further cmpxchg with old_tail to
write the new tail.

That's some really crazy code. That makes absolutely no sense what so
ever and needs to just be deleted.

On top of that, it uses atomic instrincs for a u64, which is not
something that'll actually work on a whole bunch of 32bit platforms.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help