Thread (13 messages) 13 messages, 5 authors, 2012-07-30

Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb

From: Ben Greear <hidden>
Date: 2012-07-30 21:00:32

On 07/30/2012 12:41 PM, Ben Hutchings wrote:
On Mon, 2012-07-30 at 10:23 -0700, Ben Greear wrote:
quoted
On 07/30/2012 10:16 AM, Ben Hutchings wrote:
quoted
A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps).  Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once.  This
results in a single skb that expands to 861 segments.

In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring.  The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset).  This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.  However it may be that some
hardware or firmware also fails to handle such an extreme TSO request
correctly.

Therefore, limit the number of segments per skb to 100.  This should
make no difference to behaviour unless the actual MSS is less than
about 700.
Please do not do this...or at least allow over-rides.  We love
the trick of seting very small MSS and making the NICs generate
huge numbers of small TCP frames with efficient user-space
logic.   We use this for stateful TCP load testing when high
numbers of tcp packets-per-second is desired.
Please test whether this actually makes a difference - my suspicion is
that 100 segments per skb is easily enough to prevent the host being a
bottleneck.
Any CPU I can save I can use for other tasks.  If we can use the
NIC's offload features to segment pkts, then we get near linear
increase in pkts-per-second by adding NICs..at least up to whatever
the total bandwidth of the system is...

If you want to have the OS default to a safe value, that is
fine by me..but please give us a tunable so that we can get
the old behaviour.

It's always possible I'm not the only one using this,
and I think it would be considered bad form to break
existing features and provide no work-around.

Thanks,
Ben
quoted
Intel NICs, including 10G, work just fine with minimal MSS
in this scenario.
I'll leave this to the Intel maintainers to answer.

Ben.

-- 
Ben Greear [off-list ref]
Candela Technologies Inc  http://www.candelatech.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help