RE: [PATCH v2] MIPS: Check __clang__ to avoid performance influence with GCC in csum_tcpudp_nofold()
From: David Laight <hidden>
Date: 2021-03-15 12:42:44
Also in:
lkml
From: Tiezhu Yang <yangtiezhu@loongson.cn>
quoted hunk ↗ jump to hunk
Sent: 15 March 2021 12:26 On 03/15/2021 04:49 AM, Maciej W. Rozycki wrote:quoted
On Tue, 9 Mar 2021, Tiezhu Yang wrote:quoted
diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h index 1e6c135..80eddd4 100644 --- a/arch/mips/include/asm/checksum.h +++ b/arch/mips/include/asm/checksum.h@@ -128,9 +128,13 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl) static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr, __u32 len, __u8 proto, - __wsum sum) + __wsum sum_in) { - unsigned long tmp = (__force unsigned long)sum; +#ifdef __clang__ + unsigned long sum = (__force unsigned long)sum_in; +#else + __wsum sum = sum_in; +#endifThis looks much better to me, but I'd keep the variable names unchanged as `sum_in' isn't used beyond the initial assignment anyway (you'll have to update the references with asm operands accordingly of course). Have you verified that code produced by GCC remains the same with your change in place as it used to be up to commit 198688edbf77? I can see no such information in the commit description whether here or in the said commit. MaciejHi Maciej, Thanks for your reply. gcc --version gcc (Debian 10.2.1-6) 10.2.1 20210110 net/ipv4/tcp_ipv4.c tcp_v4_send_reset() csum_tcpudp_nofold() objdump -d vmlinux > vmlinux.dump (1) Before commit 198688edbf77 ("MIPS: Fix inline asm input/output type mismatch in checksum.h used with Clang"): ffffffff80aa835c: 00004025 move a4,zero ffffffff80aa8360: 92020012 lbu v0,18(s0) ffffffff80aa8364: de140030 ld s4,48(s0) ffffffff80aa8368: 0064182d daddu v1,v1,a0 ffffffff80aa836c: 304200ff andi v0,v0,0xff ffffffff80aa8370: 9c64000c lwu a0,12(v1) ffffffff80aa8374: 9c660010 lwu a2,16(v1) ffffffff80aa8378: afa70038 sw a3,56(sp) ffffffff80aa837c: 24071a00 li a3,6656 ffffffff80aa8380: 0086202d daddu a0,a0,a2 ffffffff80aa8384: 0087202d daddu a0,a0,a3 ffffffff80aa8388: 0088202d daddu a0,a0,a4 ffffffff80aa838c: 0004083c dsll32 at,a0,0x0 ffffffff80aa8390: 0081202d daddu a0,a0,at ffffffff80aa8394: 0081082b sltu at,a0,at ffffffff80aa8398: 0004203f dsra32 a0,a0,0x0 ffffffff80aa839c: 00812021 addu a0,a0,at (2) After commit 198688edbf77 ("MIPS: Fix inline asm input/output type mismatch in checksum.h used with Clang"): ffffffff80aa836c: 00004025 move a4,zero ffffffff80aa8370: 92040012 lbu a0,18(s0) ffffffff80aa8374: de140030 ld s4,48(s0) ffffffff80aa8378: 0062182d daddu v1,v1,v0 ffffffff80aa837c: 308400ff andi a0,a0,0xff ffffffff80aa8380: 9c62000c lwu v0,12(v1) ffffffff80aa8384: 9c660010 lwu a2,16(v1) ffffffff80aa8388: afa70038 sw a3,56(sp) ffffffff80aa838c: 24071a00 li a3,6656 ffffffff80aa8390: 0046102d daddu v0,v0,a2 ffffffff80aa8394: 0047102d daddu v0,v0,a3 ffffffff80aa8398: 0048102d daddu v0,v0,a4 ffffffff80aa839c: 0002083c dsll32 at,v0,0x0 ffffffff80aa83a0: 0041102d daddu v0,v0,at ffffffff80aa83a4: 0041082b sltu at,v0,at ffffffff80aa83a8: 0002103f dsra32 v0,v0,0x0 ffffffff80aa83ac: 00411021 addu v0,v0,at (3) With this patch: ffffffff80aa835c: 00004025 move a4,zero ffffffff80aa8360: 92020012 lbu v0,18(s0) ffffffff80aa8364: de140030 ld s4,48(s0) ffffffff80aa8368: 0064182d daddu v1,v1,a0 ffffffff80aa836c: 304200ff andi v0,v0,0xff ffffffff80aa8370: 9c64000c lwu a0,12(v1) ffffffff80aa8374: 9c660010 lwu a2,16(v1) ffffffff80aa8378: afa70038 sw a3,56(sp) ffffffff80aa837c: 24071a00 li a3,6656 ffffffff80aa8380: 0086202d daddu a0,a0,a2 ffffffff80aa8384: 0087202d daddu a0,a0,a3 ffffffff80aa8388: 0088202d daddu a0,a0,a4 ffffffff80aa838c: 0004083c dsll32 at,a0,0x0 ffffffff80aa8390: 0081202d daddu a0,a0,at ffffffff80aa8394: 0081082b sltu at,a0,at ffffffff80aa8398: 0004203f dsra32 a0,a0,0x0 ffffffff80aa839c: 00812021 addu a0,a0,at (4) With the following changes based on commit 198688edbf77 ("MIPS: Fix inline asm input/output type mismatch in checksum.h used with Clang"):diff --git a/arch/mips/include/asm/checksum.hb/arch/mips/include/asm/checksum.h index 1e6c135..e1f80407 100644--- a/arch/mips/include/asm/checksum.h +++ b/arch/mips/include/asm/checksum.h@@ -130,7 +130,11 @@ static inline __wsum csum_tcpudp_nofold(__be32saddr, __be32 daddr, __u32 len, __u8 proto, __wsum sum) { +#ifdef __clang__ unsigned long tmp = (__force unsigned long)sum; +#else + __wsum tmp = sum; +#endif __asm__( " .set push # csum_tcpudp_nofold\n" ffffffff80aa835c: 00004025 move a4,zero ffffffff80aa8360: 92020012 lbu v0,18(s0) ffffffff80aa8364: de140030 ld s4,48(s0) ffffffff80aa8368: 0064182d daddu v1,v1,a0 ffffffff80aa836c: 304200ff andi v0,v0,0xff ffffffff80aa8370: 9c64000c lwu a0,12(v1) ffffffff80aa8374: 9c660010 lwu a2,16(v1) ffffffff80aa8378: afa70038 sw a3,56(sp) ffffffff80aa837c: 24071a00 li a3,6656 ffffffff80aa8380: 0086202d daddu a0,a0,a2 ffffffff80aa8384: 0087202d daddu a0,a0,a3 ffffffff80aa8388: 0088202d daddu a0,a0,a4 ffffffff80aa838c: 0004083c dsll32 at,a0,0x0 ffffffff80aa8390: 0081202d daddu a0,a0,at ffffffff80aa8394: 0081082b sltu at,a0,at ffffffff80aa8398: 0004203f dsra32 a0,a0,0x0 ffffffff80aa839c: 00812021 addu a0,a0,at The code produced by GCC remains the same between (1), (3) and (4), the last changes looks like better (with less changes based on commit 198688edbf77), so I will send v3 later.
Aren't those all the same - apart from register selection. Not that I grok the mips opcodes. But that code has horridness on its side. The only obvious difference is that something else changes the code offset from xxxx5c to xxxx6c. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)