Thread (60 messages) 60 messages, 5 authors, 2025-09-04

Re: [RFC PATCH 14/17] sched: Add deadline tracepoints

From: Gabriele Monaco <gmonaco@redhat.com>
Date: 2025-08-19 10:34:31
Also in: lkml


On Tue, 2025-08-19 at 12:12 +0200, Peter Zijlstra wrote:
On Tue, Aug 19, 2025 at 11:56:57AM +0200, Juri Lelli wrote:
quoted
Hi!

On 14/08/25 17:08, Gabriele Monaco wrote:
quoted
Add the following tracepoints:

* sched_dl_throttle(dl):
    Called when a deadline entity is throttled
* sched_dl_replenish(dl):
    Called when a deadline entity's runtime is replenished
* sched_dl_server_start(dl):
    Called when a deadline server is started
* sched_dl_server_stop(dl, hard):
    Called when a deadline server is stopped (hard) or put to
idle
    waiting for the next period (!hard)

Those tracepoints can be useful to validate the deadline
scheduler with
RV and are not exported to tracefs.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/trace/events/sched.h | 55
++++++++++++++++++++++++++++++++++++
 kernel/sched/deadline.c      |  8 ++++++
 2 files changed, 63 insertions(+)
diff --git a/include/trace/events/sched.h
b/include/trace/events/sched.h
index 7b2645b50e78..f34cc1dc4a13 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -609,6 +609,45 @@ TRACE_EVENT(sched_pi_setprio,
 			__entry->oldprio, __entry->newprio)
 );
 
+/*
+DECLARE_EVENT_CLASS(sched_dl_template,
+
+	TP_PROTO(struct sched_dl_entity *dl),
+
+	TP_ARGS(dl),
+
+	TP_STRUCT__entry(
+		__field(  struct task_struct
*,	tsk		)
+		__string( comm,		dl->dl_server ?
"server" : container_of(dl, struct task_struct, dl)-
quoted
comm	)
+		__field(  pid_t,	pid		)
+		__field( 
s64,		runtime		)
+		__field(  u64,		deadline	)
+		__field(  int,		dl_yielded	)
I wonder if, while we are at it, we want to print all the other
fields
as well (they might turn out to be useful). That would be

 .:: static (easier to retrieve with just a trace)
 - dl_runtime
 - dl_deadline
 - dl_period

 .:: behaviour (RECLAIM)
 - flags

 .:: state
 - dl_ bool flags in addition to dl_yielded
All these things are used as _tp(). That means they don't have trace
buffer entries ever, why fill out fields?
Right, that is a relic of the way I put it initially, this whole thing
is commented out (which is indeed confusing and barely noticeable in
the patch).
The tracepoints are in fact not exported to the tracefs and do not use
the print format.

I should have removed this, the real ones are at the bottom of the
file.
quoted
quoted
+	),
+
+	TP_fast_assign(
+		__assign_str(comm);
+		__entry->pid		= dl->dl_server ? -1 :
container_of(dl, struct task_struct, dl)->pid;
+		__entry->runtime	= dl->runtime;
+		__entry->deadline	= dl->deadline;
+		__entry->dl_yielded	= dl->dl_yielded;
+	),
+
+	TP_printk("comm=%s pid=%d runtime=%lld deadline=%lld
yielded=%d",
                                                        ^^^
							llu ?
As above, this should all go away.
quoted
quoted
+			__get_str(comm), __entry->pid,
+			__entry->runtime, __entry->deadline,
+			__entry->dl_yielded)
+);
...
quoted
@@ -1482,6 +1486,7 @@ static void update_curr_dl_se(struct rq
*rq, struct sched_dl_entity *dl_se, s64
 
 throttle:
 	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
+		trace_sched_dl_throttle_tp(dl_se);
 		dl_se->dl_throttled = 1;
I believe we also need to trace the dl_check_constrained_dl()
throttle, please take a look.
Probably yes, strangely I couldn't see failures without it, but it may
be down to my test setup. I'm going to have a look.
quoted
Also - we discussed this point a little already offline - but I
still wonder if we have to do anything special for dl-server defer.
Those entities are started as throttled until 0-lag, so maybe we
should still trace them explicitly as so?
The naming might need a bit of a consistency check here, but for the
monitor, the server is running, armed or preempted. Before the 0-lag,
it will never be running, so it will stay as armed (fair tasks running)
or preempted (rt tasks running).

armed and preempted have the _throttled version just to indicate an
explicit throttle event occurred.

We might want to start it in the armed_throttled if we are really
guaranteed not to see a throttle event, but I think that would
complicate the model considerably.

We could instead validate the 0-lag concept in a separate, server-
specific model.

Does this initial model feel particularly wrong for the server case?
quoted
In addition, since it's related, maybe we should do something about
sched_switch event, that is currently not aware of deadlines,
runtimes, etc.
I'm not sure I follow you here, what relation with switch and
runtime/deadline should we enforce?

We don't really force the switch to occur timely after throttling, is
that what you mean?
Or a switch must occur again timely after replenishing?
As per the whole _tp() thing, you can attach to the actual
sched_switch tracepoint with a module and read whatever you want.
Yeah I believe Juri referred to model constraints on the already
existing events rather than new tracepoints here.

Thanks both,
Gabriele
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help