Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop

From: leroy christophe <hidden>
Date: 2015-08-17 13:05:45
Also in: lkml


Le 17/08/2015 13:00, leroy christophe a écrit :


Le 17/08/2015 12:56, leroy christophe a écrit :

quoted


Le 07/08/2015 01:25, Segher Boessenkool a écrit :

quoted

On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:

quoted

If this makes performance non-negligibly worse on other 32-bit 
chips, and is
an important improvement on 8xx, then we can use an ifdef since 8xx 
already
requires its own kernel build.  I'd prefer to see a benchmark 
showing that it
actually does make things worse on those chips, though.

And I'd like to see a benchmark that shows it *does not* hurt 
performance
on most chips, and does improve things on 8xx, and by how much. But it
isn't *me* who has to show that, it is not my patch.

Ok, following this discussion I made some additional measurement and 
it looks like:
* There is almost no change on the 885
* There is a non negligeable degradation on the 8323 (19.5 tb ticks 
instead of 15.3)

Thanks for pointing this out, I think my patch is therefore not good.

Oops, I was talking about my other past, the one that was to optimise 
ip_csum_fast.
I still have to measure csum_partial

Now, I have the results for csum_partial(). The measurement is done with 
mftbl() before and after calling the function, with IRQ off to get a 
stable measure. Measurement is done with a transfer of vmlinux file done 
3 times via scp toward the target. We get approximatly 50000 calls to 
csum_partial()

On MPC885:
1/ Without the patchset, mean time spent in csum_partial() is 167 tb ticks.
2/ With the patchset, mean time is 150 tb ticks

On MPC8323:
1/ Without the patchset, mean time is 287 tb ticks
2/ With the patchset, mean time is 256 tb ticks

The improvement is approximatly 10% in both cases

So, unlike my patch on ip_fast_csum(), this one is worth it.

Christophe

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help