Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release
From: Peter Zijlstra <peterz@infradead.org>
Date: 2021-06-01 06:48:51
Also in:
lkml
On Mon, May 31, 2021 at 10:03:33PM +0300, Adrian Hunter wrote:
On 31/05/21 6:10 pm, Leo Yan wrote:quoted
Hi Peter, Adrian, On Wed, May 19, 2021 at 10:03:19PM +0800, Leo Yan wrote:quoted
Load-acquire and store-release are one-way permeable barriers, which can be used to guarantee the memory ordering between accessing the buffer data and the buffer's head / tail. This patch optimizes the memory ordering with the load-acquire and store-release barriers.Is this patch okay for you? Besides this patch, I have an extra question. You could see for accessing the AUX buffer's head and tail, it also support to use compiler build-in functions for atomicity accessing: __sync_val_compare_and_swap() __sync_bool_compare_and_swap() Since now we have READ_ONCE()/WRITE_ONCE(), do you think we still need to support __sync_xxx_compare_and_swap() atomicity?I don't remember, but it seems to me atomicity is needed only for a 32-bit perf running with a 64-bit kernel.
But that:
do {
old_tail = __sync_val_compare_and_swap(&pc->aux_tail, 0, 0);
} while (!__sync_bool_compare_and_swap(&pc->aux_tail, old_tail, tail));
doesn't want to make sense to me. It appears to do a cmpxchg with 0
values to load old_tail, and then a further cmpxchg with old_tail to
write the new tail.
That's some really crazy code. That makes absolutely no sense what so
ever and needs to just be deleted.
On top of that, it uses atomic instrincs for a u64, which is not
something that'll actually work on a whole bunch of 32bit platforms.