Thread (15 messages) 15 messages, 4 authors, 2010-02-17

Re: [PATCH 4/4] Staging: Octeon: Free transmit SKBs in a timely manner.

From: David Daney <hidden>
Date: 2010-02-15 22:05:29
Also in: netdev

On 02/15/2010 01:11 PM, Eric Dumazet wrote:
Le lundi 15 février 2010 à 12:41 -0800, David Daney a écrit :
quoted
On 02/15/2010 12:27 PM, Eric Dumazet wrote:
quoted
Le lundi 15 février 2010 à 12:13 -0800, David Daney a écrit :
quoted
If we wait for the once-per-second cleanup to free transmit SKBs,
sockets with small transmit buffer sizes might spend most of their
time blocked waiting for the cleanup.

Normally we do a cleanup for each transmitted packet.  We add a
watchdog type timer so that we also schedule a timeout for 150uS after
a packet is transmitted.  The watchdog is reset for each transmitted
packet, so for high packet rates, it never expires.  At these high
rates, the cleanups are done for each packet so the extra watchdog
initiated cleanups are not needed.
s/needed/fired/
or perhaps s/are not needed/are neither needed nor fired/
quoted
Hmm, but re-arming a timer for each transmited packet must have a cost ?
The cost is fairly low (less than 10 processor clock cycles).  We didn't
add this for amusement, people actually do things like only send UDP
packets from userspace.  Since we can fill the transmit queue faster
than it is emptied, the socket transmit buffer is quickly consumed.  If
we don't free the SKBs in short order, the transmitting process get to
take a long sleep (until our previous once per second clean up task was
run).
I understand this, but traditionaly, NIC drivers dont use a timer, but a
'TX complete' interrupt, that usually fires a few us after packet
submission on Gigabit speed.
Indeed.  Lacking this type of interrupt, the watchdog seemed the best 
short term solution.

I am investigating the possibility of feeding TX complete notifications 
back through the RX path where it is possible to generate interrupts. 
The drawback to this is that it takes a lot more CPU cycles as well as 
added cache pressure.
A fast program could try to send X small udp packets in less than 150
us, X being greater than the size of your TX ring.
My TX queue (it is not a ring) size can be made arbitrarily large 
(currently 1000).  64bytes * 1000 packets * 10 bits/packet / 10e9 
bits/sec  == 640uS.  My watchdog will fire after less than 1/4 of the 
ring capacity is freed.
So your patch makes the window smaller, but it still is there (at
physical layer, we'll see a burst of packets, a ~100us delay, then a
second burst)
With this patch, there will be no burstiness using default socket buffer 
sizes and packets of arbitrary size on a standard 1gig port.

On the 10gig ports there is the possibility for burstiness as you aptly 
explain.  However, in practice it would be difficult to arrange things 
to achieve sufficiently high packet rates, so we can live with it like this.

David Daney
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help