Thread (74 messages) 74 messages, 16 authors, 2009-09-01

Re: RFC: THE OFFLINE SCHEDULER

From: raz ben yehuda <hidden>
Date: 2009-08-28 15:22:45
Also in: lkml

On Fri, 2009-08-28 at 09:25 -0400, Rik van Riel wrote:
raz ben yehuda wrote:
quoted
yes. latency is a crucial property. 
In the case of network packets, wouldn't you get a lower
latency by transmitting the packet from the CPU that
knows the packet should be transmitted, instead of sending
an IPI to another CPU and waiting for that CPU to do the
work?
Hello Rik
If I understand what you are saying, you say that I pass 1.5K packets to
a offline CPU ?
If so, then this is not what I do, because you are very right, it does
not make any sense. 
I do not pass packets to an offline cpu , i pass assignments. an
assignment is a buffer with some context of what do with it (like aio)
and a buffer is of ~1MB. Also, the offline processor holds the network
interface as it own interface. No two offline processors transmit over a
single interface.( I modified the bonding driver to work with offline
processor for that ). I am aware of network queue per processors, but
benchmarks proved this was better.( I do not have these benchmarks
now). 
Also these engines do not release any sk_buffs to the operating system,
these packets are being reused over and over to reduce latency of
allocating memory and cache misses. 
Also, in some cases I disabled the transmit interrupts and I released
packets ( --skb->users was still greater than 0, not really release ) in
an offline context.I learned it from the chelsio driver. This way, I
reduced more load from the operating system. It proved to be better in
large 1Gbps arrays and was able to remove atomic_inc atomic_dec in some
variants of the code, atomic operations cost a lot.  
in MSI cards I did not find it useful.in the example i showed, i use MSI
and system is almost idle.
Also, as I recall , IPI will not pass to an offladed processor. offsced
it runs NMI.
Also, I would to express my apologies if any of this correspondence
seems to be as I am trying to PR offsched. I am not.
Inter-CPU communication has always been the bottleneck
when it comes to SMP performance.  Why does adding more
inter-CPU communication make your system faster, instead
of slower like one would expect?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help