Thread (57 messages) 57 messages, 9 authors, 2007-07-04

Re: [WIP][PATCHES] Network xmit batching

From: Evgeniy Polyakov <hidden>
Date: 2007-06-08 12:26:40

On Fri, Jun 08, 2007 at 07:31:07AM -0400, jamal (hadi@cyberus.ca) wrote:
On Fri, 2007-08-06 at 12:38 +0400, Evgeniy Polyakov wrote:
quoted
On Thu, Jun 07, 2007 at 06:23:16PM -0400, jamal (hadi@cyberus.ca) wrote:
quoted
quoted
I believe both are called with no lock. The idea is to avoid the lock
entirely when unneeded. That code may end up finding that the packet
[..]
quoted
+	netif_tx_lock_bh(odev);
+	if (!netif_queue_stopped(odev)) {
+
+		idle_start = getCurUs();
+		pkt_dev->tx_entered++;
+		ret = odev->hard_batch_xmit(&odev->blist, odev);
[..]
quoted
The same applies to *_gso case.
You missed an important piece which is grabbing of
__LINK_STATE_QDISC_RUNNING
But lock is still being hold - or there was no intention to reduce lock
usage? As far as I read Krishna's mail, lock usage was not an issue, so
that hunk probably should be dropped from the analysis.
 
quoted
Without lock that would be wrong - it accesses hardware.
We are achieving the goal of only a single CPU entering that path. Are
you saying that is not good enough?
Then why essentially the same code (current batch_xmit callback)
previously was always called with disabled interrupts? Aren't there
some watchdog/link/poll/whatever issues present?
quoted
and i also do not know, what service demand is :)
From the explanation seems to be how much cpu was used while sending. Do
you have any suggestions for computing cpu use?
in pktgen i added code to count how many microsecs were used in
transmitting.
Something, that anyone can understand :)
For example /proc stats, although it is not very accurate, but it is
really usable parameter from userspace point ov view.
quoted
Result looks good, but I still do not understand how it appeared, that
is why I'm not that excited about idea - I just do not know it in
details.
To add to KKs explanation on other email:
Essentially the value is in amortizing the cost of barriers and IO per
packet. For example the queue lock is held/released only once per X
packets. DMA kicking which includes both a PCI IO write and mbs is done
only once per X packets. There are still a lot of room for improvement
of such IO;
Btw, what is the size of the packet in pktgen in your tests? Likely it
is small, since result is that good. That can explain alot.
cheers,
jamal
-- 
	Evgeniy Polyakov
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help