Re: [PATCH] net: code cleanups
From: Eric Dumazet <hidden>
Date: 2010-09-30 06:31:46
Le jeudi 30 septembre 2010 à 14:09 +0800, Changli Gao a écrit :
On Thu, Sep 30, 2010 at 1:16 PM, Eric Dumazet [off-list ref] wrote:quoted
Le jeudi 30 septembre 2010 à 10:24 +0800, Changli Gao a écrit :quoted
Compare operations are more readable, and compilers generate the same code for the both.You have a buggy compiler then.gcc version 4.4.3 (Gentoo 4.4.3-r2 p1.2) rth = rcu_dereference(rth->dst.rt_next)) { if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) | ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) | (rth->fl.iif ^ iif) | 2f12: 44 3b 80 dc 00 00 00 cmp 0xdc(%rax),%r8d 2f19: 0f 85 a2 00 00 00 jne 2fc1 <ip_route_input_common+0x145quoted
rth->fl.oif | 2f1f: 83 b8 d8 00 00 00 00 cmpl $0x0,0xd8(%rax) 2f26: 0f 85 95 00 00 00 jne 2fc1 <ip_route_input_common+0x145quoted
tos &= IPTOS_RT_MASK; hash = rt_hash(daddr, saddr, iif, rt_genid(net)); for (rth = rcu_dereference(rt_hash_table[hash].chain); rth; rth = rcu_dereference(rth->dst.rt_next)) { if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) | 2f2c: 44 3b b8 e4 00 00 00 cmp 0xe4(%rax),%r15d 2f33: 0f 85 88 00 00 00 jne 2fc1 <ip_route_input_common+0x145> ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) | 2f39: 44 3b b0 e8 00 00 00 cmp 0xe8(%rax),%r14d 2f40: 75 7f jne 2fc1 <ip_route_input_common+0x145>quoted
I know this code is ugly, but please keep it as is, dont add conditional branches on hot paths.If the compiler doesn't generate conditional branches, we have to touch every necessary field of all the cache entries in one hash bucket. Is it better than condition branch? I think the compiler developers know it better.
Last famous words. Are you aware of cache lines (64 bytes at least on typical cpus), and that all fields are already in CPU L1 cache ? I (and others) worked hard in the past.
And the compiler reorders the conditional branches, is it expected?
Your compiler added conditional branches on a code not wanting them, only because on _your_ cpu, these conditional branches might be cheap. Now, try to compile for an i686 target and see the difference. If there was no difference, your compiler would be _buggy_, because not generating optimal assembly. Here I get : c141dda9: 8b 55 e8 mov -0x18(%ebp),%edx c141ddac: 8b 81 9c 00 00 00 mov 0x9c(%ecx),%eax c141ddb2: 33 91 a0 00 00 00 xor 0xa0(%ecx),%edx c141ddb8: 31 f0 xor %esi,%eax c141ddba: 09 d0 or %edx,%eax c141ddbc: 8b 55 e0 mov -0x20(%ebp),%edx c141ddbf: 33 91 94 00 00 00 xor 0x94(%ecx),%edx c141ddc5: 09 d0 or %edx,%eax c141ddc7: 0f b6 55 e7 movzbl -0x19(%ebp),%edx c141ddcb: 0b 81 90 00 00 00 or 0x90(%ecx),%eax c141ddd1: 32 91 a4 00 00 00 xor 0xa4(%ecx),%dl c141ddd7: 0f b6 d2 movzbl %dl,%edx c141ddda: 09 d0 or %edx,%eax c141dddc: 0f 85 9d 00 00 00 jne c141de7f <ip_route_input_common+0x1b4> As you can see, only one conditional branch. Your patch is not welcomed, thanks.