Re: netif_rx packet dumping

From: jamal <hidden>
Date: 2005-03-07 13:55:11

On Fri, 2005-03-04 at 03:47, Baruch Even wrote:

jamal wrote:

quoted

Can you explain a little more? Why does the the throttling cause any
bad behavior thats any different from the queue being full? In both
cases, packets arriving during that transient will be dropped.

If you have 300 packets in the queue and the throttling kicks in you now 
drop ALL packets until the queue is empty, this will normally take some 
time, during all of this time you are dropping all the ACKs that are 
coming in, you lose SACK information and potentially you leave no packet 
in flight so that the next packet will be sent only due to retransmit 
timer waking up, at which point your congestion control algorithm starts
from cwnd=1.

You can look at the report http://hamilton.ie/net/LinuxHighSpeed.pdf for 
some graphs of the effects.

Always cool to see some test running across the pond. 

Were the processors tied to NICs? 
Your experiment is more than likely a single flow, correct?
In other words the whole queue was infact dedicated just for your one
flow - thats why you can call this queue a transient burst queue. 
Do you still have the data that shows how many packets were dropped
during this period. Do you still have the experimental data? I am
particulary interested in seeing the softnet stats as well as tcp
netstats.

I think your main problem was the huge amounts of SACK on the writequeue
and the resultant processing i.e section 1.1 and how you resolved that.
I dont see any issue in dropping ACKs, many of them even for such large
windows as you have - TCPs ACKs are cummulative. It is true if you drop
"large" enough amounts of ACKS, you will end up in timeouts - but large
enough in your case must be in the minimal 1000 packets. And to say you
dropped a 1000 packets while processing 300 means you were taking too
long processing the 300. So it would be interesting to see a repeat of
the test after youve resolved 1.1 but without removing the congestion
code. 
Then what would be really interesting is to see the perfomance you get
from multiple flows with and without congestion. 
I am not against a the benchmarky nature of the single flow and tuning
for that, but we should also look at a wider scope at the effect before
you handwave based on the result of one testcase.
Infact i would agree with giving you a way to turn off the congestion
control - and i am not sure how long we should keep it around with NAPI
getting more popular.. I will prepare a simple patch.
What you really need to do eventually is use NAPI not these antiquated
schemes.

I am also worried that since you used a non-NAPI driver, the effect of
reordering necessitating the UNDO is much much higher.
So if i was you i would repeat 1.2 with the fix from 1.1 as well as
tying the NIC to one CPU. And it would be a good idea to present more
detailed results - not just tcp windows fluctuating (you may not need
them for the paper, but would be useful to see for debugging purposes
other parameters).

quoted

the smart schemes are not going to make it that much better if 
the hardware/software can't keep up.

consider that this queue could be shared by as many as a few thousand
unrelated TCP flows - not just one. It is also used for packets being
forwarded. If you factor that the system has to react to protect itself
then these schemes may make sense. The best place to do it is really in
hardware, but the closer to the hardware as possible is the next besr
possible spot.

Actually the problem we had was with TCP end-system performance 
problems, compared to them the router problem is more limited since it 
only needs to do a lookup on a hash, tree or whatever and not a linked 
list of several thousand packets.

I am not sure i followed. If you mean routers dont use linked lists
you are highly mistaken.

I'd prefer avoiding an AFQ scheme in the incoming queue, if you do add 
one, please make it configurable so I can disable it. The drop-tail 
behaviour is good enough for me. Remember that an AFQ needs to drop 
packets long before the queue is full so there will likely be more 
losses involved.

What i was suggesting to Stephen would probably make more sense to kick
in when theres congestion. weighted windowing allows to sense things
that are coming; so the idea was to more not allow new flows once the 
we are congested. 
Just Use NAPI driver and you wont have to worry about this.

cheers,
jamal

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help