Re: [patch 14/33] xen: xen time implementation
From: Andi Kleen <hidden>
Date: 2007-06-06 10:26:41
Also in:
lkml, xen-devel
On Wednesday 06 June 2007 12:05:22 Jeremy Fitzhardinge wrote:
Jan Beulich wrote:quoted
Xen itself knows to deal with this (by using an error correction factor to slow down the local [TSC-based] clock), but for the kernel such a situation may be fatal: If clocksource->cycle_last was most recently set on a CPU with shadow->tsc_to_nsec_mul sufficiently different from that where getnstimeofday() is being used, timekeeping.c's __get_nsec_offset() will calculate a huge nanosecond value (due to cyc2ns() doing unsigned operations), worth abut 4000s. This value may then be used to set a timeout that was intended to be a few milliseconds, effectively yielding a hung app (and perhaps system).Hm. I had a similar situation in the stolen time code, and I ended up using signed values so I could clamp at zero. Though that might have been another bug; either way, the clamp is still there. I wonder if cyc2ns might not be better using signed operations? Or perhaps better, the time code should endevour to do things on a completely per-cpu basis (haven't really given this any thought).
This is being worked on.
quoted
Unfortunately so far I haven't been able to think of a reasonable solution to this - a simplistic approach like making xen_clocksource_read() check the value it is about to return against the last value it returned doesn't seem to be a good idea (time might appear to have stopped over some period of time otherwise), nor does attempting to adjust the shadowed tsc_to_nsec_mul values (because the kernel can't know whether it should boost the lagging CPU or throttle the rushing one).I once had some code in there to do that, implemented in very boneheaded way with a spinlock to protect the "last time returned" variable. I expect there's a better way to implement it.
But any per CPU setup likely needs this to avoid non monotonicity -Andi