Thread (35 messages) 35 messages, 6 authors, 2008-09-12

Re: using software TSO on non-TSO capable netdevices

From: Lennert Buytenhek <hidden>
Date: 2008-07-31 09:50:52

On Thu, Jul 31, 2008 at 10:34:13AM +0300, Ilpo Järvinen wrote:
quoted
quoted
The hacky patch below (on top of 2.6.27-rc1 + stubbing out the
sk_can_gso() check) reduces the 1 GiB 1000 Mb/s sendfile test from:
 ...
quoted
I.e. dramatic CPU time improvements, and some overall speedup as well.

I wonder if something like this can be done in a less hacky fashion --
the hard part I guess is deciding when to keep coalescing (to reduce
CPU overhead) vs. when to push out what has been coalesced so far (in
order to keep the pipe filled), and I'm not sure I have good ideas
about how to make that decision.
Interesting, I'll take a closer look at this.

Actually your patch is less of a surprise, because one of the issues I
had to surmount constantly when rewriting the TSO output path was the
implicit conflict between TSO deferral (to accumulate segments) and
the nagle logic.
I think your statement makes very little sense to me (though I had to 
lookup the meaning of surmount but that seems not so significant 
anyway)... They both work into the same direction, ie., to delay sending 
to prevent excessive processing of small bits, but the region of operation 
shouldn't overlap (nagle works with <mss, and tso deferring logic 
basically begins from where the nagle ends)?

It seems to me that this not about conflict between TSO deferring and 
nagle sub-mss logic at all (perhaps there wasn't as direct relation to 
this issue as I read...?) AFAICT, the change only makes (!nonagle && 
tp->packets_out && tcp_minshall_check(tp)) test in tcp_nagle_check more 
likely to occur (and result in false), ie., basically we end up using 
nagle test also to prevent sending of >= mss skbs, besides the usual 
functionality which is to prevent sending in case of < mss sized ones. 
...Which seems just an extension to what we checked for in 
tcp_tso_should_defer().
I wanted a way to get larger GSO segments, and the idea was to rig
the nagle check to consider sub-N*mss frames as small frames and not
let more than one of them into the pipe at any given time.  I don't
know whether the change I made accomplishes exactly that, but it did
end up giving me larger GSO segments, which was the goal.

It makes the GSO segment size distribution pretty chaotic, though:

10k seg: 2:851 3:430 4:3385 5:330 6:3611 7:382 8:949 9:18 10:43 11:1
10k size: 5:851 8:430 11:3385 14:330 17:3611 19:382 22:949 25:18 28:43 31:1
10k seg: 2:1952 3:410 4:2855 5:340 6:2956 7:356 8:1059 9:24 10:48
10k size: 5:1952 8:410 11:2855 14:340 17:2956 19:356 22:1059 25:24 28:48
10k seg: 2:1036 3:569 4:4824 5:369 6:2241 7:251 8:643 9:20 10:46 11:1
10k size: 5:1036 8:569 11:4824 14:369 17:2241 19:251 22:643 25:20 28:46 31:1
10k seg: 2:1270 3:408 4:3686 5:350 6:2910 7:319 8:988 9:15 10:54
10k size: 5:1270 8:408 11:3686 14:350 17:2910 19:319 22:988 25:15 28:54
10k seg: 2:870 3:407 4:4211 5:380 6:3392 7:286 8:389 9:20 10:45
10k size: 5:870 8:407 11:4211 14:380 17:3392 19:286 22:389 25:20 28:45
10k seg: 2:1217 3:411 4:3542 5:315 6:3263 7:348 8:832 9:23 10:48 11:1
10k size: 5:1217 8:411 11:3542 14:315 17:3263 19:348 22:832 25:23 28:48 31:1

("10k seg" numbers are the distribution of gso_segs for 10k skbuffs,
and "10k size" are the distribution of skb->len >> 9 for 10k skbuffs.)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help