Thread (26 messages) 26 messages, 5 authors, 2024-04-16

Re: [PATCH v7] posix-timers: add clock_compare system call

From: Sagi Maimon <hidden>
Date: 2024-03-14 12:19:51
Also in: linux-arch, lkml, netdev

hi Thomas

On Thu, Mar 14, 2024 at 1:12 PM Thomas Gleixner [off-list ref] wrote:
On Thu, Mar 14 2024 at 11:05, Sagi Maimon wrote:
quoted
Some user space applications need to read a couple of different clocks.
Each read requires moving from user space to kernel space.
Reading each clock separately (syscall) introduces extra
unpredictable/unmeasurable delay. Minimizing this delay contributes to user
space actions on these clocks (e.g. synchronization etc).
I asked for a proper description of the actual problem several times now
and you still provide some handwaving blurb. Feel free to ignore me, but
then please don't be surprised if I ignore you too.
Nobody is ignoring your notes, and I address any notes given by any
maintainer in the most serious way.
As far as explaining the actual problem this is the best that I can,
but let me try to explain better:
We did many tests with different CPU loading and compared sampling the
same clock twice,
once in user space and once by using the system call.
We have noticed an improvement up to hundreds of nanoseconds while
using the system call.
Those results improved our ability to sync different PHCs
Also why does reading two random clocks make any sense at all? Your code
allows to read CLOCK_TAI and CLOCK_THREAD_CPUTIME_ID. What for?
Initially we needed to sync some different PHCs for our user space
application, that is why we came with this idea.
The first idea was an IOCTL that returned samples of several PHCs for
the need of synchronization.
Richard Cochran suggested a system call instead, which will add the
ability to get various system clocks, while this
implementation is more complex then IOCTL, I think that he was right,
for future usage.
This is about PTP, no?
yes
Again you completely fail to explain why the existing PTP ioctls are not
sufficient for the purpose and why you need to provide a new interface
which is completely ill defined.
The same as my answer above .....
quoted
arch/x86/entry/syscalls/syscall_64.tbl |   1 +
drivers/ptp/ptp_clock.c                |  34 ++++--
include/linux/posix-clock.h            |   2 +
include/linux/syscalls.h               |   4 +
include/uapi/asm-generic/unistd.h      |   5 +-
kernel/time/posix-clock.c              |  25 +++++
kernel/time/posix-timers.c             | 138 +++++++++++++++++++++++++
kernel/time/posix-timers.h             |   2 +
This needs to be split up into:

     1) Infrastructure in posix-timers.c
     2) Wire up the syscall in x86
     3) Infrastructure in posix-clock.c
     4) Usage in ptp_clock.c

and not as a big lump of everything.
I know, but I think the benefit worth it
I agree that an IOCTL won't require such a big changes in the kernel code
quoted
+/**
+ * clock_compare - Get couple of clocks time stamps
So the name of the syscall suggest that it compares two clocks, but the
actual functionality is to read two clocks.

This does not make any sense at all. Naming matters.
you are right the name was suggested by Richard when the main idea was
returning the offset between the clocks.
If you have a better name, please tell me
quoted
+ * @clock_a: clock a ID
+ * @clock_b: clock b ID
+ * @tp_a:            Pointer to a user space timespec64 for clock a storage
+ * @tp_b:            Pointer to a user space timespec64 for clock b storage
+ *
+ * clock_compare gets time sample of two clocks.
+ * Supported clocks IDs: PHC, virtual PHC and various system clocks.
+ *
+ * In case of PHC that supports crosstimespec and the other clock is Monotonic raw
+ * or system time, crosstimespec will be used to synchronously capture
+ * system/device time stamp.
+ *
+ * In other cases: Read clock_a twice (before, and after reading clock_b) and
+ * average these times – to be as close as possible to the time we read clock_b.
+ *
+ * Returns:
+ *   0               Success. @tp_a and @tp_b contains the time stamps
+ *   -EINVAL         @clock a or b ID is not a valid clock ID
+ *   -EFAULT         Copying the time stamp to @tp_a or @tp_b faulted
+ *   -EOPNOTSUPP     Dynamic POSIX clock does not support crosstimespec()
+ **/
+SYSCALL_DEFINE5(clock_compare, const clockid_t, clock_a, const clockid_t, clock_b,
+             struct __kernel_timespec __user *, tp_a, struct __kernel_timespec __user *,
+             tp_b, s64 __user *, offs_err)
+{
+     struct timespec64 ts_a, ts_a1, ts_b, ts_a2;
+     struct system_device_crosststamp xtstamp_a1, xtstamp_a2, xtstamp_b;
+     const struct k_clock *kc_a, *kc_b;
+     ktime_t ktime_a;
+     s64 ts_offs_err = 0;
+     int error = 0;
+     bool crosstime_support_a = false;
+     bool crosstime_support_b = false;
Please read and follow the documentation provided at:

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
quoted
+     kc_a = clockid_to_kclock(clock_a);
+     if (!kc_a) {
+             error = -EINVAL;
+             return error;
What's wrong about 'return -EINVAL;' ?
correct will be fixed on next patch
quoted
+     }
+
+     kc_b = clockid_to_kclock(clock_b);
+     if (!kc_b) {
+             error = -EINVAL;
+             return error;
+     }
+
+     // In case crosstimespec supported and b clock is Monotonic raw or system
+     // time, synchronously capture system/device time stamp
Please don't use C++ comments.
will be fixed on next patch
quoted
+     if (clock_a < 0) {
This is just wrong. posix-clocks ar not the only ones which have a
negative clock id. See clockid_to_kclock()
quoted
+             error = kc_a->clock_get_crosstimespec(clock_a, &xtstamp_a1);
What's a crosstimespec?

This function fills in a system_device_crosststamp, so why not name it
so that the purpose of the function is obvious?
correct , it will be fixed on next patch
ptp_clock::info::getcrosststamp() has a reasonable name. So why do you
need to come up with something which makes the code obfuscated?
correct , it will be fixed on next patch
quoted
+             if (!error) {
+                     if (clock_b == CLOCK_MONOTONIC_RAW) {
+                             ts_b = ktime_to_timespec64(xtstamp_a1.sys_monoraw);
+                             ts_a1 = ktime_to_timespec64(xtstamp_a1.device);
+                             goto out;
+                     } else if (clock_b == CLOCK_REALTIME) {
+                             ts_b = ktime_to_timespec64(xtstamp_a1.sys_realtime);
+                             ts_a1 = ktime_to_timespec64(xtstamp_a1.device);
+                             goto out;
+                     } else {
+                             crosstime_support_a = true;
Right. If clock_b is anything else than CLOCK_MONOTONIC_RAW or
CLOCK_REALTIME then this is true.
quoted
+                     }
+             }
So in case of an error, this just keeps going with an uninitialized
xtstamp_a1 and if the clock_b part succeeds it continues and operates on
garbage.
On error  xtstamp_a1 will be taken again using clock_get_crosstimespec
so no one will be operating on garbage.
Please explain the problem better, because I don't see it.
quoted
+     }
+
+     // In case crosstimespec supported and a clock is Monotonic raw or system
+     // time, synchronously capture system/device time stamp
+     if (clock_b < 0) {
+             // Synchronously capture system/device time stamp
+             error = kc_b->clock_get_crosstimespec(clock_b, &xtstamp_b);
+             if (!error) {
+                     if (clock_a == CLOCK_MONOTONIC_RAW) {
+                             ts_a1 = ktime_to_timespec64(xtstamp_b.sys_monoraw);
+                             ts_b = ktime_to_timespec64(xtstamp_b.device);
+                             goto out;
+                     } else if (clock_a == CLOCK_REALTIME) {
+                             ts_a1 = ktime_to_timespec64(xtstamp_b.sys_realtime);
+                             ts_b = ktime_to_timespec64(xtstamp_b.device);
+                             goto out;
+                     } else {
+                             crosstime_support_b = true;
+                     }
+             }
+     }
+
+     if (crosstime_support_a)
+             error = kc_a->clock_get_crosstimespec(clock_a, &xtstamp_a1);
What? crosstime_support_a is only true when the exactly same call
returned success. Why does it need to be called here again?
I wanted to take the three samples as close as possible to each other.
minimum code between the calls, that is why the call is done again.
quoted
+     else
+             error = kc_a->clock_get_timespec(clock_a, &ts_a1);
+
+     if (error)
+             return error;
+
+     if (crosstime_support_b)
+             error = kc_b->clock_get_crosstimespec(clock_b, &xtstamp_b);
+     else
+             error = kc_b->clock_get_timespec(clock_b, &ts_b);
+
+     if (error)
+             return error;
+
+     if (crosstime_support_a)
+             error = kc_a->clock_get_crosstimespec(clock_a, &xtstamp_a2);
+     else
+             error = kc_a->clock_get_timespec(clock_a, &ts_a2);
+
+     if (error)
+             return error;
The logic and the code flow here are unreadable garbage and there are
zero comments what this is supposed to do.
I will add comments.
please no need to use negative words like "garbage" (not the first time),
please keep it professional and civilized.
quoted
+     if (crosstime_support_a) {
+             ktime_a = ktime_sub(xtstamp_a2.device, xtstamp_a1.device);
+             ts_offs_err = ktime_divns(ktime_a, 2);
+             ktime_a = ktime_add_ns(xtstamp_a1.device, (u64)ts_offs_err);
+             ts_a1 = ktime_to_timespec64(ktime_a);
This is just wrong.

     read(a1);
     read(b);
     read(a2);

You _CANNOT_ assume that (a1 + ((a2 - a1) / 2) is anywhere close to the
point in time where 'b' is read. This code is preemtible and
interruptible. I explained this to you before.

Your explanation in the comment above the function is just wishful
thinking.
you explained it before, but still it is better then two consecutive
user space calls which are also preemptible
and the userspace to kernel context switch time is added.
quoted
+ * In other cases: Read clock_a twice (before, and after reading clock_b) and
+ * average these times – to be as close as possible to the time we read clock_b.
Can you please sit down and provide a precise technical description of
the problem you are trying to solve and explain your proposed solution
at the conceptual level instead of throwing out random implementations
every few days?
Thanks,

        tglx
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help