Thread (58 messages) 58 messages, 6 authors, 2005-07-31

Re: [PATCH] loop unrolling in net/sched/sch_generic.c

From: Eric Dumazet <hidden>
Date: 2005-07-05 15:58:39

Thomas Graf a écrit :
quoted
OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop :

Because you don't specify -funroll-loop
I'm using vanilla 2.6.12 : no -funroll-loop in it. Maybe in your tree, not on 99.9% of 2.6.12 trees.

Are you suggesting everybody should use this compiler flag ?
Something like :

net/sched/Makefile:

CFLAGS_sch_generic.o := -funroll-loops

?
[...]

quoted
Please give us the code your compiler produces,

Unrolled version:

pfifo_fast_dequeue:
	pushl	%esi
	xorl	%edx, %edx
	pushl	%ebx
	movl	12(%esp), %esi
	movl	128(%esi), %eax
	leal	128(%esi), %ecx
	cmpl	%ecx, %eax
	je	.L132
	movl	%eax, %edx
	movl	(%eax), %eax
	decl	8(%ecx)
	movl	$0, 8(%edx)
	movl	%ecx, 4(%eax)
	movl	%eax, 128(%esi)
	movl	$0, 4(%edx)
	movl	$0, (%edx)
.L132:
	testl	%edx, %edx
	je	.L131
	movl	96(%edx), %ebx
	movl	80(%esi), %eax
	decl	40(%esi)
	subl	%ebx, %eax
	movl	%eax, 80(%esi)
	movl	%edx, %eax
.L117:
	popl	%ebx
	popl	%esi
	ret
.L131:
	movl	20(%ecx), %eax
	leal	20(%ecx), %edx
	xorl	%ebx, %ebx
	cmpl	%edx, %eax
	je	.L137
	movl	%eax, %ebx
	movl	(%eax), %eax
	decl	8(%edx)
	movl	$0, 8(%ebx)
	movl	%edx, 4(%eax)
	movl	%eax, 20(%ecx)
	movl	$0, 4(%ebx)
	movl	$0, (%ebx)
.L137:
	testl	%ebx, %ebx
	je	.L147
.L146:
	movl	96(%ebx), %ecx
	movl	80(%esi), %eax
	decl	40(%esi)
	subl	%ecx, %eax
	movl	%eax, 80(%esi)
	movl	%ebx, %eax
	jmp	.L117
.L147:
	movl	40(%ecx), %eax
	leal	40(%ecx), %edx
	xorl	%ebx, %ebx
	cmpl	%edx, %eax
	je	.L142
	movl	%eax, %ebx
	movl	(%eax), %eax
	decl	8(%edx)
	movl	$0, 8(%ebx)
	movl	%edx, 4(%eax)
	movl	%eax, 40(%ecx)
	movl	$0, 4(%ebx)
	movl	$0, (%ebx)
.L142:
	xorl	%eax, %eax
	testl	%ebx, %ebx
	jne	.L146
	jmp	.L117
OK thanks, but you dont give the code for my version :) shorter and unrolled as you can see, and with nice predicted branches.

00000fc0 <pfifo_fast_dequeue>:
      fc0:       56                      push   %esi
      fc1:       89 c1                   mov    %eax,%ecx
      fc3:       53                      push   %ebx
      fc4:       8d 98 a0 00 00 00       lea    0xa0(%eax),%ebx
      fca:       39 98 a0 00 00 00       cmp    %ebx,0xa0(%eax)
      fd0:       89 da                   mov    %ebx,%edx
      fd2:       75 22                   jne    ff6 <pfifo_fast_dequeue+0x36>
      fd4:       8d 90 c4 00 00 00       lea    0xc4(%eax),%edx
      fda:       39 90 c4 00 00 00       cmp    %edx,0xc4(%eax)
      fe0:       89 d3                   mov    %edx,%ebx
      fe2:       75 12                   jne    ff6 <pfifo_fast_dequeue+0x36>
      fe4:       8d 98 e8 00 00 00       lea    0xe8(%eax),%ebx
      fea:       31 f6                   xor    %esi,%esi
      fec:       39 98 e8 00 00 00       cmp    %ebx,0xe8(%eax)
      ff2:       89 da                   mov    %ebx,%edx
      ff4:       74 27                   je     101d <pfifo_fast_dequeue+0x5d>
      ff6:       8b 32                   mov    (%edx),%esi
      ff8:       39 d6                   cmp    %edx,%esi
      ffa:       74 26                   je     1022 <pfifo_fast_dequeue+0x62>
      ffc:       8b 06                   mov    (%esi),%eax
      ffe:       ff 4b 08                decl   0x8(%ebx)
     1001:       c7 46 08 00 00 00 00    movl   $0x0,0x8(%esi)
     1008:       89 50 04                mov    %edx,0x4(%eax)
     100b:       89 02                   mov    %eax,(%edx)
     100d:       c7 46 04 00 00 00 00    movl   $0x0,0x4(%esi)
     1014:       c7 06 00 00 00 00       movl   $0x0,(%esi)
     101a:       ff 49 28                decl   0x28(%ecx)
     101d:       5b                      pop    %ebx
     101e:       89 f0                   mov    %esi,%eax
     1020:       5e                      pop    %esi
     1021:       c3                      ret
     1022:       ff 49 28                decl   0x28(%ecx)
     1025:       31 f6                   xor    %esi,%esi
     1027:       eb f4                   jmp    101d <pfifo_fast_dequeue+0x5d>

I just noticed that this is a local modification of my own, so in
the vanilla tree it indeed doesn't have any impact on the code
generated.

Still, your patch does not make sense to me. The latest tree
also includes my pfifo_fast changes wich modified the code to
maintain a backlog and made it easy to add more fifos at compile
time.  If you want the loop unrolled then let the compiler do it
via -funroll-loop. These kind of optimization seem as uncessary
to me as all the loopback optimizations.
I dont want change compiler flags in my tree and loose this optim when 2.6.13 is released.

I dont know about loopback optimization, I am not involved with this stuff, maybe you think I'm another guy ?

It seems to me you give unrelated arguments.
I dont know what are your plans, but mine were not to say you are writing bad code.
Just to give my performance analysis and feedback, I'm sorry if it hurts you.


Eric Dumazet
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help