Re: [RFC PATCH 00/11] rv: Add scheduler specification monitors
From: Gabriele Monaco <gmonaco@redhat.com>
Date: 2025-02-07 11:36:32
Also in:
lkml
On Fri, 2025-02-07 at 11:55 +0100, Juri Lelli wrote:
Hi Gabriele, On 06/02/25 09:09, Gabriele Monaco wrote:quoted
This patchset starts including adapted scheduler specifications from Daniel's task model [1].Thanks a lot for working on this. Apart from being cool stuff per-se, it means a lot personally to see Daniel's work continuing to be developed.quoted
As the model is fairly complicated, it is split in several generators and specifications. The tool used to create the model can output a unified model, but that would be hardly readable (9k states). RV allows monitors to run and react concurrently. Running the cumulative model is equivalent to running single components using the same reactors, with the advantage that it's easier to point out which specification failed in case of error. We allow this by introducing nested monitors, in short, the sysfs monitor folder will contain a monitor named sched, which is nothing but an empty container for other monitors. Controlling the sched monitor (enable, disable, set reactors) controls all nested monitors. The task model proposed by Daniel includes 12 generators and 33 specifications. The generators are good for documentation but are usually implied in some specifications. Not all monitors work out of the box, mainly because of those reasons: * need to distinguish if preempt disable leads to schedule * need to distinguish if irq disable comes from an actual irq * assumptions not always true on SMP The original task model was designed for PREEMPT_RT and this patchset is only tested on an upstream kernel with full preemption enabled.I played with your additions a bit and I was able to enable/disable monitors, switch reactors, etc., w/o noticing any issue.
Thanks for trying it out!
I wonder if you also had ways to test that the monitors actually react properly in case of erroneous conditions (so that we can see a reactor actually react :).
Well, in my understanding, reactors should fire if there is a problem either in the kernel or in the model logic. While trying things out, I had more than a few models failing and I excluded them from this patch because they are not stable. Ideally you shouldn't be seeing errors using those monitors, unless you (un)intentionally break something in the kernel. That said, the monitor task switch while scheduling (tss) imposes context switches whenever we reach the scheduler. Daniel modified the sched_switch tracepoint to fire also if prev==next (in fact no switch is happening), I'm assuming the tss specification is partly why that was necessary. During my tests, I didn't apply that change, yet I've never seen the monitor failing. If you manage to call __schedule while the next picked task is the same as the currently running one, you should see an error and a reactor firing. Since I couldn't reproduce the above case, I ignored it for the current RFC, however if that's possible in practice, we should perhaps add another event describing this fake switch to prevent the monitor from failing. Thanks, Gabriele