Thread (32 messages) 32 messages, 8 authors, 2023-09-27

Re: [PATCH V13 - RESEND 02/10] arm64/perf: Add BRBE registers and fields

From: Mark Rutland <mark.rutland@arm.com>
Date: 2023-08-15 13:06:45
Also in: linux-perf-users, lkml

On Tue, Aug 15, 2023 at 11:17:19AM +0100, James Clark wrote:

On 31/07/2023 10:06, Mark Rutland wrote:
quoted
On Mon, Jul 31, 2023 at 08:03:21AM +0530, Anshuman Khandual wrote:
quoted

On 7/28/23 22:22, James Clark wrote:
quoted

On 28/07/2023 17:20, Will Deacon wrote:
quoted
On Tue, Jul 11, 2023 at 01:54:47PM +0530, Anshuman Khandual wrote:
quoted
This adds BRBE related register definitions and various other related field
macros there in. These will be used subsequently in a BRBE driver which is
being added later on.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Tested-by: James Clark <redacted>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Anshuman Khandual <redacted>
---
 arch/arm64/include/asm/sysreg.h | 103 +++++++++++++++++++++
 arch/arm64/tools/sysreg         | 158 ++++++++++++++++++++++++++++++++
 2 files changed, 261 insertions(+)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index b481935e9314..f95e30c13c8b 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -163,6 +163,109 @@
 #define SYS_DBGDTRTX_EL0		sys_reg(2, 3, 0, 5, 0)
 #define SYS_DBGVCR32_EL2		sys_reg(2, 4, 0, 7, 0)
 
+#define __SYS_BRBINFO(n)		sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 0))
+#define __SYS_BRBSRC(n)			sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 1))
+#define __SYS_BRBTGT(n)			sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 2))
It's that time on a Friday but... aren't these macros busted? I think you
need brackets before adding the offset, otherwise wouldn't, for example,
target registers 0-15 all access info register 0 and __SYS_BRBTGT(16) would
then start accessing source register 0?

I'm surprised that the compiler doesn't warn about this, but even more
surprised that you managed to test this.

Please tell me I'm wrong!

Will
No I think you are right, it is wrong. Luckily there is already an
extraneous bracket so you you can fix it by moving one a place down:

  sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10) >> 2) + 2))

It's interesting because the test [1] is doing quite a bit and looking
at the branch info, and that src and targets match up to function names.
I also manually looked at the branch buffers and didn't see anything
obviously wrong like things that looked like branch infos in the source
or target fields. Will have to take another look to see if it would be
possible for the test to catch this.

James

[1]:
https://gitlab.arm.com/linux-arm/linux-jc/-/commit/3a7ddce70c2daadb63fcc511de0a89055ca48b32
((((n) & 0x10)) >> 2 + 2) ---> ((((n) & 0x10) >> 2) + 2)

The additional brackets are useful in explicitly telling the compiler but
what it the compiler is just doing the right thing implicitly i.e computing
the shifting operation before doing the offset addition.
No; that is not correct. In c, '+' has higher precedence than '>>'.

For 'a >> b + c' the compiler *must* treat that as 'a >> (b + c)', and not as
'(a >> b) + c'

That's trivial to test:

| [mark@gravadlaks:~]% cat shiftadd.c 
| #include <stdio.h>
| 
| unsigned long logshiftadd(unsigned long a,
|                           unsigned long b,
|                           unsigned long c)
| {
|         printf("%ld >> %ld + %ld is %ld\n",
|                a, b, c, a >> b + c);
| }
| 
| int main(int argc, char *argv)
| {
|         logshiftadd(0, 0, 0);
|         logshiftadd(0, 0, 1);
|         logshiftadd(0, 0, 2);
|         printf("\n");
|         logshiftadd(1024, 0, 0);
|         logshiftadd(1024, 0, 1);
|         logshiftadd(1024, 0, 2);
|         printf("\n");
|         logshiftadd(1024, 2, 0);
|         logshiftadd(1024, 2, 1);
|         logshiftadd(1024, 2, 2);
| 
|         return 0;
| }
| [mark@gravadlaks:~]% gcc shiftadd.c -o shiftadd
| [mark@gravadlaks:~]% ./shiftadd 
| 0 >> 0 + 0 is 0
| 0 >> 0 + 1 is 0
| 0 >> 0 + 2 is 0
| 
| 1024 >> 0 + 0 is 1024
| 1024 >> 0 + 1 is 512
| 1024 >> 0 + 2 is 256
| 
| 1024 >> 2 + 0 is 256
| 1024 >> 2 + 1 is 128
| 1024 >> 2 + 2 is 64
quoted
During testing, all > those captured branch records looked alright.
I think we clearly need better testing here.

Thanks,
Mark.
Hi Will and Mark,

So I started looking into the test both with and without the fix,
strangely I couldn't see any difference in the branch outputs, or
anywhere in the driver where it would be flipping or filtering anything
to make it only appear to be working. This was a bit confusing, but
added up with the original point that the test was passing and it was
actually doing something.

So I started going deeper and found what the issue (non-issue) is.

Firstly why is there no warning:

The expression is stringified and passed to the assembler, so it skips
the C compiler warning settings. I can send a patch to fix this, but all
we need to do is get the compiler to evaluate the argument and then
throw it away, luckily there are no other issues found even with an
allyesconfig, so BRBE was the only thing with this bug:

 #define read_sysreg_s(r) ({
 	u64 __val;
+	u32 __maybe_unused __check_r = (u32)(r);
 	asm volatile(__mrs_s("%0", r) : "=r" (__val));
 	__val;					
 })


Secondly, why does BRBE actually work:

Well the assembler (at least in my Clang toolchain) has a different
order of operations to C. I put a minimal repro here:
https://godbolt.org/z/YP9adh5xh

You can see the op2 should be a 0b100000 (0x20) for BRBSRC and it
appears to be, you can also see that moving the bracket makes no
difference in this case.

For some more evidence, the disassembler I have locally actually gives
the correct register name, even when the bracket is wrong, and diffing
the .o file gives no difference when moving the bracket:

  0000000000000008 <main>:
   8:   d503245f        bti     c
   c:   d503201f        nop
  10:   d503201f        nop
  14:   2a1f03e0        mov     w0, wzr
  18:   d5318028        mrs     x8, brbsrc0_el1
  1c:   d5318128        mrs     x8, brbsrc1_el1
  20:   d65f03c0        ret

Seems completely crazy to me that this is actually the case. So maybe I
am also wrong. Don't know if this counts as a compiler bug or it's just
supposed to be like that.
From a quick dig, it's supposed to be like that: the GNU assembler uses a
different operator precedence to C, and clang's assembler does the same for
compatibility. What a great.

Compare:

  https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_6.html#SEC66

... with:

  https://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_precedence

Adding the brackets will make this work in either case, so I think that's the
right thing to do for now.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help