Thread (38 messages) 38 messages, 8 authors, 2014-10-02

Re: [net-next PATCH V5] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: 2014-10-01 15:34:57

On 10/01/14 10:55, Tom Herbert wrote:
On Wed, Oct 1, 2014 at 6:17 AM, Jamal Hadi Salim [off-list ref] wrote:
quoted
On 09/30/14 14:20, David Miller wrote:
quoted
From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Tue, 30 Sep 2014 07:07:37 -0400
quoted
Note, there are benefits as you have shown - but i would not
consider those to be standard use cases (actully one which would
have shown clear win is the VM thing Rusty was after).

I completely disagree, you will see at least decreased cpu utilization
for a very common case, bulk single stream transfers.

So lets say the common use case is:
= modern day cpu (pick some random cpu)
= 1-10 Gbps ethernet (not 100mbps)
= 1-24 tcp or udp bulk (you said one, Jesper had 24 which sounds better)

Run with test cases:
a) unchanged (no bulking code added at all)
vs
b) bulking code added and used
vs
c) bulking code added and *not* used

Jesper's results are comparing #b and #c.

And if #b + #c are slightly worse or equal then we have a win;->
BTW: meant to say if #b and #c are slightly worse than #a then we have
a win.
quoted
Again, I do believe things like traffic generators or the VM io
or something like tuntap that crosses user space will have a clear
benefit (but are those common use cases?).
You're making this much more complicated that it actually is. The
algorithm is simple-- queue wakes up, finds out how exactly many bytes
to dequeue, and performs dequeue of enough packets under one lock.
It is not about bql.
The issue is: if i am going to attempt to do a bulk transfer
every single time (with new code) and for the common use case the
result is "no need to do bulking" then you just added extra code that
is unnecessary for that common case.
Even a single extra if statement at high packet rate is still
costly and would be easy to observe.
The
should be a benefit when transmitting high rate as we know that
reducing locking is generally a win.
You mean amortizing the cost of the lock not removing a lock?
Yes, of course. That is if the added code ends up being hit
meaningfully. Jesper said (and it was my experience as well)
that it was _hard_ to achieve bulking in such a case.
The fear here is in the common case (if we say the
bulk transfer is a common case) infact that code is reduced to be
a per-packet as opposed to a burst of packets, then there is
no win.
The tests should clarify, no?

cheers,
jamal
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help