Thread (12 messages) 12 messages, 4 authors, 2010-01-29

Re: [RFC] [PATCH] Optimize TCP sendmsg in favour of fast devices?

From: Krishna Kumar2 <hidden>
Date: 2010-01-27 07:00:18

Hi Herbert,
Herbert Xu [off-list ref] wrote on 01/21/2010 03:11 PM
Sorry for the late response.
quoted
quoted
I had to remove the F_SG flag from cxgb3 driver (using ethtool
didn't show any difference in performance since GSO was enabled
on the device due to register_netdev setting it). Testing show a
drop of 25% in performance with this patch for non-SG device,
the extra alloc/memcpy is showing up.

For the SG driver, I get a good performace gain (not anywhere
close to 25% though). What do you suggest?
I don't think we can add your change if it hurts non-SG
devices that much.
Wait, we need to be careful when testing this.  Non-SG devices
do actually benefit from TSO which they otherwise cannot access.

If you unset the F_SG bit, then that would disable TSO too.  So
you need to enable GSO to compensate.  So Krishna, did you check
with tcpdump to see if GSO was really enabled with SG off?
OK, I unset F_SG and set F_GSO (in driver). With this, tcpdump shows
GSO is enabled - the tcp packet sizes builds up to 65160 bytes.

I ran 5 serial netperf's with 16K and another 5 serial netperfs
with 64K I/O sizes, and the aggregate result is:

0. Driver unsets F_SG but sets F_GSO:
      Original code with 16K: 19471.65
      New code with 16K:      19409.70
      Original code with 64K: 21357.23
      New code with 64K:      22050.42

To recap the other tests I did today:

1. Driver unsets F_SG, and with GSO off
      Original code with 16K: 10123.56
      New code with 16K:      7111.12
      Original code with 64K: 11568.99
      New code with 64K:      7611.37

2. Driver unsets F_SG and uses ethtool to set GSO:
      Original code with 16K: 18864.38
      New code with 16K:      18465.54
      Original code with 64K: 21005.43
      New code with 64K:      22529.24

Thanks,

- KK
IIRC when I did a similar test with e1000 back when I wrote this
the performance of GSO with SG off was pretty much the same as
no GSO with SG off.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help