Re: [PATCH 5/5] rv: Add rts monitor

From: Gabriele Monaco <gmonaco@redhat.com>
Date: 2025-08-06 08:15:54
Also in: lkml

On Tue, 2025-08-05 at 17:45 +0200, Nam Cao wrote:

On Tue, Aug 05, 2025 at 02:22:17PM +0200, Nam Cao wrote:

quoted

On Tue, Aug 05, 2025 at 10:40:30AM +0200, Gabriele Monaco wrote:

quoted

Hello Nam,

I just built and booted up the monitor in a VM (virtme-ng), the
configuration has preemptirq tracepoints and all monitors so far
(as we
have seen earlier, it doesn't build if rtapp monitors are not
there
because of the circular dependency in the tracepoints).

All I did was to enable the monitor and printk reactor, but I get
a
whole lot of errors (as in, I need to quit the VM for it to
stop):

[ 1537.699834] rv: rts: 7: violation detected
[ 1537.699930] rv: rts: 3: violation detected
[ 1537.701827] rv: rts: 6: violation detected
[ 1537.704894] rv: rts: 0: violation detected
[ 1537.704925] rv: rts: 0: violation detected
[ 1537.704988] rv: rts: 3: violation detected
[ 1537.705019] rv: rts: 3: violation detected
[ 1537.705998] rv: rts: 0: violation detected
[ 1537.706024] rv: rts: 0: violation detected
[ 1537.709875] rv: rts: 6: violation detected
[ 1537.709921] rv: rts: 6: violation detected
[ 1537.711241] rv: rts: 6: violation detected

Curiously enough, I only see those CPUs (0, 3, 6 and 7).
Other runs have different CPUs but always a small subset (e.g.
10-15,
6-7 only 2).
It doesn't always occur but enabling/disabling the monitor might
help
triggering it.

Any idea what is happening?

There are two issues:

  - When the monitor is disabled then enabled, the number of queued
task does not reset. The monitor may mistakenly thinks there are
queued RT tasks, but there aren't.

  - The enqueue tracepoint is registered before the dequeue
tracepoint.
    Therefore there may be a enqueue followed by a dequeue, but the
monitor missed the latter.

The first issue can be fixed by reseting the queued task number at
enabling.

Mmh good catch, indeed you have a counter separated from the LTL thing
here.

For the second issue, LTL monitors need something similar to
da_monitor_enabled_##name(void). But a quick workaround is reordering
the tracepoint registerations.

I didn't make it on time before your V2, I assume you solved already so
you might ignore this.

You kinda have something like the da_monitor_enabled: the
rv_ltl_all_atoms_known

I wonder if you could define LTL_RT_TASK_ENQUEUED only when you
actually know it (or are reasonably sure based on your internal
counter). Or at least not set all atoms until the monitor is fully set
up.

Anyway reordering the tracepoints registration is likely necessary
whatever you do, but I'm afraid a problem like this can occur pretty
often with this type of monitors.

Thanks,
Gabriele

quoted hunk ↗ jump to hunk

So with the below diff, I no longer see the issue.

Thanks again for noticing this!

Nam

diff --git a/kernel/trace/rv/monitors/rts/rts.c

b/kernel/trace/rv/monitors/rts/rts.c
index 473004b673c5..3ddbf09db0dd 100644

--- a/kernel/trace/rv/monitors/rts/rts.c
+++ b/kernel/trace/rv/monitors/rts/rts.c

@@ -81,14 +81,21 @@ static void handle_sched_switch(void *data, bool

preempt, struct task_struct *pr
 
 static int enable_rts(void)
 {
+	unsigned int cpu;
 	int retval;
 
 	retval = ltl_monitor_init();
 	if (retval)
 		return retval;
 
-	rv_attach_trace_probe("rts", enqueue_task_rt_tp,
handle_enqueue_task_rt);
+	for_each_possible_cpu(cpu) {
+		unsigned int *queued = per_cpu_ptr(&nr_queued, cpu);
+
+		*queued = 0;
+	}
+
 	rv_attach_trace_probe("rts", dequeue_task_rt_tp,
handle_dequeue_task_rt);
+	rv_attach_trace_probe("rts", enqueue_task_rt_tp,
handle_enqueue_task_rt);
 	rv_attach_trace_probe("rts", sched_switch,
handle_sched_switch);
 
 	return 0;

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help