Re: [Xen-devel] [patch 14/33] xen: xen time implementation
From: Jeremy Fitzhardinge <hidden>
Date: 2007-06-06 10:05:40
Also in:
lkml, xen-devel
Jan Beulich wrote:
Xen itself knows to deal with this (by using an error correction factor to slow down the local [TSC-based] clock), but for the kernel such a situation may be fatal: If clocksource->cycle_last was most recently set on a CPU with shadow->tsc_to_nsec_mul sufficiently different from that where getnstimeofday() is being used, timekeeping.c's __get_nsec_offset() will calculate a huge nanosecond value (due to cyc2ns() doing unsigned operations), worth abut 4000s. This value may then be used to set a timeout that was intended to be a few milliseconds, effectively yielding a hung app (and perhaps system).
Hm. I had a similar situation in the stolen time code, and I ended up using signed values so I could clamp at zero. Though that might have been another bug; either way, the clamp is still there. I wonder if cyc2ns might not be better using signed operations? Or perhaps better, the time code should endevour to do things on a completely per-cpu basis (haven't really given this any thought).
I'm sure the time keeping code can't deal with negative values returned from __get_nsec_offset() (timespec_add_ns() is an example, used in __get_realtime_clock_ts()), otherwise a potential solution might have been to set the clock source's multiplier and shift to one and zero respectively.
I don't quite follow you here, but wouldn't setting the multiplier/shift to 1/0 preclude being able to warp the clocksource with ntp?
But I think that a clock source can be expected to be monotonic anyway, which Xen's interpolation mechanism doesn't guarantee across multiple CPUs. (I'm actually beginning to think that this might also be the reason for certain test suites occasionally reporting timeouts to fire early.)
Does the kernel expect the tsc clocksource to be completely monotonic across cpus? Any form of cpu-local clocksource is going to have this problem; I wonder if clocksources can really only be useful if they're always referring to a single system-wide time reference - seems like a bit of a limitation.
Unfortunately so far I haven't been able to think of a reasonable solution to this - a simplistic approach like making xen_clocksource_read() check the value it is about to return against the last value it returned doesn't seem to be a good idea (time might appear to have stopped over some period of time otherwise), nor does attempting to adjust the shadowed tsc_to_nsec_mul values (because the kernel can't know whether it should boost the lagging CPU or throttle the rushing one).
I once had some code in there to do that, implemented in very boneheaded
way with a spinlock to protect the "last time returned" variable. I
expect there's a better way to implement it.
J