Thread (7 messages) 7 messages, 3 authors, 2010-09-30

Re: [PATCH] net: code cleanups

From: Eric Dumazet <hidden>
Date: 2010-09-30 06:31:46

Le jeudi 30 septembre 2010 à 14:09 +0800, Changli Gao a écrit :
On Thu, Sep 30, 2010 at 1:16 PM, Eric Dumazet [off-list ref] wrote:
quoted
Le jeudi 30 septembre 2010 à 10:24 +0800, Changli Gao a écrit :
quoted
Compare operations are more readable, and compilers generate the same code
for the both.
You have a buggy compiler then.
gcc version 4.4.3 (Gentoo 4.4.3-r2 p1.2)

             rth = rcu_dereference(rth->dst.rt_next)) {
                if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
                     ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
                     (rth->fl.iif ^ iif) |
    2f12:       44 3b 80 dc 00 00 00    cmp    0xdc(%rax),%r8d
    2f19:       0f 85 a2 00 00 00       jne    2fc1 <ip_route_input_common+0x145
quoted
                     rth->fl.oif |
    2f1f:       83 b8 d8 00 00 00 00    cmpl   $0x0,0xd8(%rax)
    2f26:       0f 85 95 00 00 00       jne    2fc1 <ip_route_input_common+0x145
quoted
        tos &= IPTOS_RT_MASK;
        hash = rt_hash(daddr, saddr, iif, rt_genid(net));

        for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
             rth = rcu_dereference(rth->dst.rt_next)) {
                if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
    2f2c:       44 3b b8 e4 00 00 00    cmp    0xe4(%rax),%r15d
    2f33:       0f 85 88 00 00 00       jne    2fc1
<ip_route_input_common+0x145>
                     ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
    2f39:       44 3b b0 e8 00 00 00    cmp    0xe8(%rax),%r14d
    2f40:       75 7f                   jne    2fc1
<ip_route_input_common+0x145>

quoted
I know this code is ugly, but please keep it as is, dont add conditional
branches on hot paths.
If the compiler doesn't generate conditional branches, we have to
touch every necessary field of all the cache entries in one hash
bucket. Is it better than condition branch? I think the compiler
developers know it better.
Last famous words.

Are you aware of cache lines (64 bytes at least on typical cpus), and
that all fields are already in CPU L1 cache ? I (and others) worked hard
in the past.
And the compiler reorders the conditional branches, is it expected?
Your compiler added conditional branches on a code not wanting them,
only because on _your_ cpu, these conditional branches might be cheap.

Now, try to compile for an i686 target and see the difference.

If there was no difference, your compiler would be _buggy_, because not
generating optimal assembly.

Here I get :

c141dda9:       8b 55 e8                mov    -0x18(%ebp),%edx
c141ddac:       8b 81 9c 00 00 00       mov    0x9c(%ecx),%eax
c141ddb2:       33 91 a0 00 00 00       xor    0xa0(%ecx),%edx
c141ddb8:       31 f0                   xor    %esi,%eax
c141ddba:       09 d0                   or     %edx,%eax
c141ddbc:       8b 55 e0                mov    -0x20(%ebp),%edx
c141ddbf:       33 91 94 00 00 00       xor    0x94(%ecx),%edx
c141ddc5:       09 d0                   or     %edx,%eax
c141ddc7:       0f b6 55 e7             movzbl -0x19(%ebp),%edx
c141ddcb:       0b 81 90 00 00 00       or     0x90(%ecx),%eax
c141ddd1:       32 91 a4 00 00 00       xor    0xa4(%ecx),%dl
c141ddd7:       0f b6 d2                movzbl %dl,%edx
c141ddda:       09 d0                   or     %edx,%eax
c141dddc:       0f 85 9d 00 00 00       jne    c141de7f <ip_route_input_common+0x1b4>

As you can see, only one conditional branch.

Your patch is not welcomed, thanks.

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help