Re: [PATCH 4/5] sched: Add rt task enqueue/dequeue trace points
From: Nam Cao <hidden>
Date: 2025-08-01 07:29:55
Also in:
lkml
Subsystem:
scheduler, the rest, tracing · Maintainers:
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Linus Torvalds, Steven Rostedt, Masami Hiramatsu
On Fri, Aug 01, 2025 at 09:12:08AM +0530, K Prateek Nayak wrote:
Just thinking out loud, putting this tracepoint here can lead to a
"dequeued -> dequeued" transition for fair task when they are in delayed
dequeue state.
dequeue_task(p)
trace_dequeue_task_tp(p) # First time
dequeue_task_fair(p)
p->se.delayed = 1
...
<sched_switch> # p is still delayed
...
sched_setscheduler(p)
if (prev_class != next_class && p->se.sched_delayed)
dequeue_task(p, DEQUEUE_DELAYED);
trace_dequeue_task_tp(p) # Second time
It is not an issue as such but it might come as a surprise if users are
expecting a behavior like below which would be the case for !fair task
currently (and for all tasks before v6.12):
digraph state_automaton {
center = true;
size = "7,11";
{node [shape = plaintext, style=invis, label=""] "__init_enqueue_dequeue_cycle"};
{node [shape = ellipse] "enqueued"};
{node [shape = ellipse] "dequeued"};
"__init_enqueue_dequeue_cycle" -> "enqueued";
"__init_enqueue_dequeue_cycle" -> "dequeued";
"enqueued" [label = "enqueued", color = green3];
"enqueued" -> "dequeued" [ label = "dequeue_task" ];
"dequeued" [label = "dequeued", color = red];
"dequeued" -> "enqueued" [ label = "enqueue_task" ];
{ rank = min ;
"__init_enqueue_dequeue_cycle";
"dequeued";
"enqueued";
}
}
Another:
"dequeued" -> "dequeued" [ label = "dequeue_task" ];
edge would be needed in that case for >= v6.12. It is probably nothing
and can be easily handled by the users if they run into it but just
putting it out there for the record in case you only want to consider a
complete dequeue as "dequeued". Feel free to ignore since I'm completely
out of my depth when it comes to the usage of RV in the field :)Ah, thanks for pointing this out. I do want to only consider complete dequeue as "dequeued". These tracepoints are not visible from userspace, and RV does not care about enqueue/dequeue of fair tasks at the moment, so it is not a problem for now. But as a precaution, I trust the below patch will do. Nam
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index c38f12f7f903..b50668052f99 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h@@ -906,6 +906,14 @@ DECLARE_TRACE(dequeue_task_rt, TP_PROTO(int cpu, struct task_struct *task), TP_ARGS(cpu, task)); +DECLARE_TRACE(enqueue_task, + TP_PROTO(int cpu, struct task_struct *task), + TP_ARGS(cpu, task)); + +DECLARE_TRACE(dequeue_task, + TP_PROTO(int cpu, struct task_struct *task), + TP_ARGS(cpu, task)); + #endif /* _TRACE_SCHED_H */ /* This part must be outside protection */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b485e0639616..553c08a63395 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c@@ -2077,6 +2077,8 @@ unsigned long get_wchan(struct task_struct *p) void enqueue_task(struct rq *rq, struct task_struct *p, int flags) { + trace_enqueue_task_tp(rq->cpu, p); + if (!(flags & ENQUEUE_NOCLOCK)) update_rq_clock(rq);
@@ -2119,7 +2121,11 @@ inline bool dequeue_task(struct rq *rq, struct task_struct *p, int flags) * and mark the task ->sched_delayed. */ uclamp_rq_dec(rq, p); - return p->sched_class->dequeue_task(rq, p, flags); + if (p->sched_class->dequeue_task(rq, p, flags)) { + trace_dequeue_task_tp(rq->cpu, p); + return true; + } + return false; } void activate_task(struct rq *rq, struct task_struct *p, int flags)