Re: [RFC PATCH 00/11] rv: Add scheduler specification monitors

From: Gabriele Monaco <gmonaco@redhat.com>
Date: 2025-02-07 11:36:32
Also in: lkml


On Fri, 2025-02-07 at 11:55 +0100, Juri Lelli wrote:

Hi Gabriele,

On 06/02/25 09:09, Gabriele Monaco wrote:

quoted

This patchset starts including adapted scheduler specifications
from
Daniel's task model [1].

Thanks a lot for working on this. Apart from being cool stuff per-se,
it
means a lot personally to see Daniel's work continuing to be
developed.

quoted

As the model is fairly complicated, it is split in several
generators
and specifications. The tool used to create the model can output a
unified model, but that would be hardly readable (9k states).

RV allows monitors to run and react concurrently. Running the
cumulative
model is equivalent to running single components using the same
reactors, with the advantage that it's easier to point out which
specification failed in case of error.

We allow this by introducing nested monitors, in short, the sysfs
monitor folder will contain a monitor named sched, which is nothing
but
an empty container for other monitors. Controlling the sched
monitor
(enable, disable, set reactors) controls all nested monitors.

The task model proposed by Daniel includes 12 generators and 33
specifications. The generators are good for documentation but are
usually implied in some specifications.
Not all monitors work out of the box, mainly because of those
reasons:
* need to distinguish if preempt disable leads to schedule
* need to distinguish if irq disable comes from an actual irq
* assumptions not always true on SMP

The original task model was designed for PREEMPT_RT and this
patchset is
only tested on an upstream kernel with full preemption enabled.

I played with your additions a bit and I was able to enable/disable
monitors, switch reactors, etc., w/o noticing any issue.

Thanks for trying it out!

I wonder if you also had ways to test that the monitors actually
react
properly in case of erroneous conditions (so that we can see a
reactor
actually react :).

Well, in my understanding, reactors should fire if there is a problem
either in the kernel or in the model logic.
While trying things out, I had more than a few models failing and I
excluded them from this patch because they are not stable.

Ideally you shouldn't be seeing errors using those monitors, unless you
(un)intentionally break something in the kernel.

That said, the monitor task switch while scheduling (tss) imposes
context switches whenever we reach the scheduler.
Daniel modified the sched_switch tracepoint to fire also if prev==next
(in fact no switch is happening), I'm assuming the tss specification is
partly why that was necessary.
During my tests, I didn't apply that change, yet I've never seen the
monitor failing.

If you manage to call __schedule while the next picked task is the same
as the currently running one, you should see an error and a reactor
firing.

Since I couldn't reproduce the above case, I ignored it for the current
RFC, however if that's possible in practice, we should perhaps add
another event describing this fake switch to prevent the monitor from
failing.

Thanks,
Gabriele

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help