Re: TCP and reordering
From: Benjamin LaHaise <bcrl@kvack.org>
Date: 2012-11-28 16:19:32
On Wed, Nov 28, 2012 at 03:47:15PM +0000, David Woodhouse wrote:
On Wed, 2012-11-28 at 04:52 -0800, Eric Dumazet wrote:quoted
BQL is nice for high speed adapters.For adapters with hugely deep queues, surely? There's a massive correlation between the two, of course ??? but PPP over L2TP or PPPoE ought to be included in the classification, right?
Possibly, but there are many setups where PPPoE/L2TP do not connect to the congested link directly.
quoted
For slow one, you always can stop the queue for each packet given to start_xmit() And restart the queue at TX completion.Well yes, but only if we get notified of TX completion. It's simple enough for the tty-based channels, and we can do it with a vcc->pop() function for PPPoATM. But for PPPoE and L2TP, how do we do it? We can install a skb destructor... but then we're stomping on TSQ's use of the destructor by orphaning it too soon. I'm pondering something along the lines of if (skb->destructor) { newskb = skb_clone(skb, GFP_KERNEL); if (newskb) { skb_shinfo(newskb) = skb; skb = newskb; } } skb_orphan(skb); skb->destructor = ppp_chan_tx_completed; ... and then ppp_chan_tx_completed can also destroy the original skb (and hence invoke TSQ's destructor too) when the time comes. And in the (common?) case where we don't have an existing destructor, we don't bother with the skb_clone.
This sort of chaining of destructors is going to be very expensive in terms of CPU cycles. If this does get implemented, please ensure there is a way to turn it off. Specifically, I'm thinking of the access concetrator roles for BRAS. In many wholesale ISP setups, there are many incoming sessions coming in over a high speed link (gigabit or greater) for which the access concentrator (LAC/LNS in L2TP speak) has no idea of the bandwidth of the link actually facing the customer. Such systems are usually operated in a way to avoid ever congesting the aggregation network. In such setups, BQL on the L2TP/PPPoE interface only serves to increase CPU overhead. That said, if there is local congestion, the benefits of BQL would be worthwhile to have.
But I wish there was a nicer way to chain destructors. And no, I don't count what GSO does. We can't use the cb here anyway since we're passing it down the stack.
I think all the tunneling protocols are going to have the same problem here, so it deserves some thought about how to tackle the issue in a generic way without incurring a large amount of overhead. This exact problem is one of the reasons multilink PPP often doesn't work well over L2TP or PPPoE as compared to its behaviour over ttys. -ben -- "Thought is the essence of where you are now."