Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers
From: Adrian Hunter <adrian.hunter@intel.com>
Date: 2021-05-27 09:23:57
Also in:
lkml
On 27/05/21 11:25 am, Adrian Hunter wrote:
On 27/05/21 11:11 am, Peter Zijlstra wrote:quoted
On Thu, May 27, 2021 at 10:54:56AM +0300, Adrian Hunter wrote:quoted
On 19/05/21 5:03 pm, Leo Yan wrote:quoted
The AUX ring buffer's head and tail can be accessed from multiple CPUs on SMP system, so changes to use SMP memory barriers to replace the uniprocessor barriers.I don't think user space should attempt to be SMP-aware.Uhh, what? It pretty much has to. Since userspace cannot assume UP, it must assume SMP.Yeah that is what I meant, but consequently we generally shouldn't be using functions called smp_<anything>quoted
quoted
For perf tools, on __x86_64__ it looks like smp_rmb() is only a compiler barrier, whereas rmb() is a "lfence" memory barrier instruction, so this patch does not seem to do what the commit message says at least for x86.The commit message is somewhat confused; *mb() are not UP barriers (although they are available and useful on UP). They're device/dma barriers.quoted
With regard to the AUX area, we don't know in general how data gets there, so using memory barriers seems sensible.IIRC (but I ddn't check) the rule was that the kernel needs to ensure the AUX area is complete before it updates the head pointer. So if userspace can observe the head pointer, it must then also be able to observe the data. This is not something userspace can fix up anyway. The ordering here is between the head pointer and the data, and from a userspace perspective that's a regular smp ordering. Similar for the tail update, that's between our reading the data and writing the tail, regular cache coherent smp ordering. So ACK on the patch, it's sane and an optimization for both x86 and ARM. Just the Changelog needs work.If all we want is a compiler barrier, then shouldn't that be what we use? i.e. barrier()
I guess you are saying we still need to stop potential re-ordering across CPUs, so please ignore my comments.
quoted
quoted
quoted
Signed-off-by: Leo Yan <redacted> --- tools/perf/util/auxtrace.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index 472c0973b1f1..8bed284ccc82 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h@@ -452,7 +452,7 @@ static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm) u64 head = READ_ONCE(pc->aux_head); /* Ensure all reads are done after we read the head */ - rmb(); + smp_rmb(); return head; }@@ -466,7 +466,7 @@ static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm) #endif /* Ensure all reads are done after we read the head */ - rmb(); + smp_rmb(); return head; }@@ -478,7 +478,7 @@ static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail) #endif /* Ensure all reads are done before we write the tail out */ - mb(); + smp_mb(); #if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT) pc->aux_tail = tail; #else