Re: powerpc: mitigate impact of decrementer reset
From: Paul Clarke <hidden>
Date: 2014-11-05 17:06:45
Sorry it took me so long to get back to this... On 10/07/2014 09:52 PM, Michael Ellerman wrote:
On Tue, 2014-07-10 at 19:13:24 UTC, Paul Clarke wrote:quoted
The POWER ISA defines an always-running decrementer which can be used to schedule interrupts after a certain time interval has elapsed. The decrementer counts down at the same frequency as the Time Base, which is 512 MHz. The maximum value of the decrementer is 0x7fffffff. This works out to a maximum interval of about 4.19 seconds. If a larger interval is desired, the kernel will set the decrementer to its maximum value and reset it after it expires (underflows) a sufficient number of times until the desired interval has elapsed. The negative effect of this is that an unwanted latency spike will impact normal processing at most every 4.19 seconds. On an IBM POWER8-based system, this spike was measured at about 25-30 microseconds, much of which was basic, opportunistic housekeeping tasks that could otherwise have waited. This patch short-circuits the reset of the decrementer, exiting after the decrementer reset, but before the housekeeping tasks if the only need for the interrupt is simply to reset it. After this patch, the latency spike was measured at about 150 nanoseconds.
Thanks for the excellent changelog. But this patch makes me a bit nervous :) Do you know where the latency is coming from? Is it primarily the irq work?
Yes, it is all under irq_enter (measured at ~10us) and irq_exit (~12us).
If so I'd prefer if we could move the short circuit into __timer_interrupt() itself. That way we'd still have the trace points usable, and it would hopefully result in less duplicated logic.
But irq_enter and irq_exit are called in timer_interrupt, before __timer_interrupt is called. I don't see how that helps. The time spent in __timer_interrupt is minuscule by comparison. Are you suggesting that irq_enter/exit be moved into __timer_interrupt as well? (I'm not sure how that would impact the existing call to __timer_interrupt from tick_broadcast_ipi_handler? And if there is no impact, what's the point of separating timer_interrupt and __timer_interrupt?) Regards, PC