Re: [0/14] GRO: Lots of microoptimisations

[0/14] GRO: Lots of microoptimisations · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 6/14] tcp: Remove unnecessary window comparisons for GRO · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 7/14] tcp: Optimise len/mss comparison · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 8/14] gro: Optimise length comparison in skb_gro_header · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 13/14] gro: Store shinfo in local variable in skb_gro_receive · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 9/14] gro: Avoid unnecessary comparison after skb_gro_header · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 11/14] gro: Open-code final pskb_may_pull · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 4/14] gro: Only use skb_gro_header for completely non-linear packets · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 5/14] tcp: Optimise GRO port comparisons · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 3/14] gro: Localise offset/headlen in skb_gro_offset · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 2/14] gro: Inline skb_gro_header and cache frag0 virtual address · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 1/14] gro: Open-code frags copy in skb_gro_receive · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 12/14] gro: Nasty optimisations for page frags in skb_gro_receive · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
Re: [PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO · Andi Kleen <hidden> · 2009-05-27
Re: [PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
[PATCH 14/14] tcp: Do not check flush when comparing options for GRO · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
Re: [0/14] GRO: Lots of microoptimisations · David Miller <davem@davemloft.net> · 2009-05-27
Re: [0/14] GRO: Lots of microoptimisations · Benjamin LaHaise <hidden> · 2009-05-27
Re: [0/14] GRO: Lots of microoptimisations · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-27
Re: [0/14] GRO: Lots of microoptimisations · Benjamin LaHaise <hidden> · 2009-05-28
Re: [0/14] GRO: Lots of microoptimisations · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-29
Re: [0/14] GRO: Lots of microoptimisations · Herbert Xu <herbert@gondor.apana.org.au> · 2009-05-29
Re: [0/14] GRO: Lots of microoptimisations · Benjamin LaHaise <hidden> · 2009-05-29
Re: [0/14] GRO: Lots of microoptimisations · Herbert Xu <herbert@gondor.apana.org.au> · 2009-06-10
Re: [0/14] GRO: Lots of microoptimisations · Benjamin LaHaise <hidden> · 2009-06-12
Re: [0/14] GRO: Lots of microoptimisations · David Miller <davem@davemloft.net> · 2009-06-12
Re: [0/14] GRO: Lots of microoptimisations · Benjamin LaHaise <hidden> · 2009-06-16
Re: [0/14] GRO: Lots of microoptimisations · Herbert Xu <herbert@gondor.apana.org.au> · 2009-06-16
Re: [0/14] GRO: Lots of microoptimisations · Herbert Xu <herbert@gondor.apana.org.au> · 2009-06-17
Re: [0/14] GRO: Lots of microoptimisations · Herbert Xu <herbert@gondor.apana.org.au> · 2009-06-17
Re: [0/14] GRO: Lots of microoptimisations · Rick Jones <hidden> · 2009-06-17

From: Benjamin LaHaise <hidden>
Date: 2009-05-28 15:34:08

On Thu, May 28, 2009 at 09:08:58AM +1000, Herbert Xu wrote:

On Wed, May 27, 2009 at 01:52:23PM -0400, Benjamin LaHaise wrote:

quoted

A few questions for you: I've been looking a bit into potential GRO 
optimisations that are possible with the vxge driver.  At least from my 
existing testing on a P4 Xeon, it seems that doing packet rx via 
napi_gro_receive() was a bit slower.  I'll retest with these changes

Slower compared to LRO or GRO off?

With GRO off I'm getting ~4.7-5Gbps to the receiver which is CPU bound with 
netperf.  With GRO on, that drops to ~3.9-4.3Gbps.  The only real difference 
is the entry point into the net code being napi_gro_receive() vs 
netif_receive_skb().

quoted

of yours.  What platform have your tests been run on?  Also, do you have 
any notes/ideas on how best to make use of the GRO functionality within 
the kernel?  I'm hoping it's possible to make use of a few of the hardware 
hints to improve fast path performance.

What sort of hints do you have?

We have a few bits in the hardware descriptor which indicate if the packet 
is TCP or UDP, IPv4 or IPv6, as well as whether TCP packets are fast path 
eligible.  The hardware can also split up the headers to place the ethernet 
MAC, IP and payload in separate buffers.  I plan to run a few tests to see 
if dispatching directly from the driver into the TCP fast path makes much 
difference.

		-ben

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help