Re: Time keeping while suspended in the presence of persistent clock drift

From: Joel Daniels <hidden>
Date: 2021-12-15 22:44:08
Also in: lkml

Hi John,

Thanks for your feedback.

quoted

   [A] On machines with a persistent clock how is userspace supposed
       to be sure what drift to measure? Can it assume that the drift
       of the persistent clock used for sleep time injection is the
       same as the drift of /dev/rtc? This seems dangerous.

Yea, there can be multiple RTCs as well.

quoted

   [B] Sleep time injection can come from the "persistent clock" or,
       if there is no persistent clock, from an RTC driver. I'd like
       to correct for drift from the perisistant clock but not touch
       the RTC driver sleep time injection mechanism. Is this
       acceptable or do people feel that any drift correction should
       work with both mechanisms in order to ensure a polished
       interface?

This dual interface comes from the desire to support both the more
atomic/earlier correction we can do w/ the persistent_clock interface
while holding the timekeeping lock, while also supporting RTC devices
that may sleep when being read, or may have dependencies that aren't
ready that early in resume.

Admittedly having two separate abstractions here is a bit of a pain,
and fixing just one side doesn't make it better.

Thanks; that makes sense to me. I suppose I ought to have a separate
sleep-time-injection drift correction parameter per RTC? That way the
kernel wouldn't do something silly if somebody hotplugs one RTC while
removing another. The persistent clock is almost always exposed as an
RTC as well, so either I could try to be very clever and make the
persistent clock share the drift correction parameter of its
corresponding RTC or I could just maintain a separate correction for
the persistent clock.

quoted

   [C] Some users may want to correct for drift during suspend-to-RAM
       but during suspend-to-disk they might boot into some other
       operating system which itself sets the CMOS RTC. Hopefully,
       this could be solved from userspace by changing the drift
       correction parameter to 0 just before a suspend-to-disk
       operation.

Oof. This feels particularly complex and fragile to try to address.

Yes, I think we should ignore this issue and treat all suspend/resume
cycles identically. People who regularly dual-boot can just not use
the new feature.

Personally, I'm not sure this warrants adding new userland interfaces
for. I'd probably lean towards having the RTC framework internally
measure and correct for drift, rather than adding an extra knob in
userland.

Measuring RTC drift is hard. The standard PC RTC has only one second
resolution so you have to wait for the "edge" of a tick and measure
drift over an extended period of time. If you have some NTP daemon
slewing your system clock while you try to measure RTC drift then
you will get garbage. If your motherboard gets hot enough then your
RTC will run at a different rate while the machine is on than while
it is off.

I know of three programs that measure RTC drift today:

  # hwclock: you must use it to set the RTC twice, the second time
    with the "--update-drift" argument. The manual suggests waiting
    one day between calls. The drift and offset information is
    stored in /etc/adjtime. On boot "hwclock --hctosys" will use this
    to set the system clock correctly.

  # adjtimex (program not syscall) when run with the "--compare"
    option. It uses a least squares estimate from multiple samples
    which by default are each 10 seconds apart.

  # chrony with the "rtcfile" directive. It tracks the RTC over time
    to measure its offset and drift similarly to how it tracks the
    system clock drift. Tracking information is saved into
      /var/lib/chrony/rtc
    and can be used (via "chronyd -s") to set the system clock
    correctly on next boot.

Any method of measuring the drift is going to need to persist the
drift coefficient to disk so that it can set the system clock
correctly on boot. I think it would be best for the kernel to use this
same coefficient.

Alternatively I'd go very simple and just put the correction factor in
a boot argument.

This works for my use case though it won't be useful to a general
distro. Would you have one argument being used regardless of where the
sleep injection was coming from or would you try to tie it to the
persistent clock and/or a specific RTC?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help