Le jeudi 29 avril 2010 à 20:23 +0200, Andi Kleen a écrit :
On Thu, Apr 29, 2010 at 07:56:12PM +0200, Eric Dumazet wrote:
quoted
Le jeudi 29 avril 2010 à 19:42 +0200, Andi Kleen a écrit :
quoted
quoted
Andi, what do you think of this one ?
Dont we have a function to send an IPI to an individual cpu instead ?
That's what this function already does. You only set a single CPU
in the target mask, right?
IPIs are unfortunately always a bit slow. Nehalem-EX systems have X2APIC
which is a bit faster for this, but that's not available in the lower
end Nehalems. But even then it's not exactly fast.
I don't think the IPI primitive can be optimized much. It's not a cheap
operation.
If it's a problem do it less often and batch IPIs.
It's essentially the same problem as interrupt mitigation or NAPI
are solving for NICs. I guess just need a suitable mitigation mechanism.
Of course that would move more work to the sending CPU again, but
perhaps there's no alternative. I guess you could make it cheaper it by
minimizing access to packet data.
-Andi
Well, IPI are already batched, and rate is auto adaptative.
After various changes, it seems things are going better, maybe there is
something related to cache line trashing.
I 'solved' it by using idle=poll, but you might take a look at
clockevents_notify (acpi_idle_enter_bm) abuse of a shared and higly
contended spinlock...
acpi_idle_enter_bm should not be executed on a Nehalem, it's obsolete.
If it does on your system something is wrong.
Ahh, that triggers a bell. There's one issue that if the remote CPU is in a very
deep idle state it could take a long time to wake it up. Nehalem has deeper
sleep states than earlier CPUs. When this happens the IPI sender will be slow
too I believe.
Are the target CPUs idle?
Yes, mostly, but about 200.000 wakeups per second I would say...
If a cpu in deep state receives an IPI, process a softirq, should it
come back to deep state immediately, or should it wait for some
milliseconds ?
Perhaps need to feed some information to cpuidle's governour to prevent this problem.
idle=poll is very drastic, better to limit to C1
How can I do this ?
Thanks !