Re: [PATCH 4/5] sched: Add rt task enqueue/dequeue trace points
From: Gabriele Monaco <gmonaco@redhat.com>
Date: 2025-08-01 11:04:57
Also in:
lkml
On Fri, 2025-08-01 at 15:26 +0530, K Prateek Nayak wrote:
There are a few more cases with delayed dequeue: 1. A delayed task can be reenqueued before it is completely dequeued and will lead to a enqueue -> enqueue transition if we don't trace the first dequeue. 2. There are cases in set_user_nice() and __sched_setscheduler() where a delayed task is dequeued in delayed state and be put back in the cfs_rq in delayed state - an attempt to handle 1. can trip this. Best I could think of is the below diff on top of your suggestion where a "delayed -> reenqueue" is skipped but the case 2. is captures in case one needs to inspect some properties from set_user_nice() / __sched_setscheduler(): (only build tested on top of the diff you had pasted)
Hello Prateek, thanks for the comments, this looks much more convoluted than I would have expected. As Nam pointed out, currently RV is not going to rely on those events for fair tasks (existing monitors are fine with tracepoints like wakeup/set_state/switch). For the time being it might be better just add the tracepoints in the RT enqueue/dequeue (the only needed for this series) and not complicate things. At most we may want to keep tracepoint names general, allowing the tracing call to be added later to other locations (or moved to a general one) without changing too much, besides existing users. And perhaps a comment saying the tracepoints are currently only supported on RT would do. Anyway, that's your call Nam, I'm fine with your initial proposal as well, unless some scheduler guys complain. Thanks, Gabriele
quoted hunk ↗ jump to hunk
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9598984bee8d..1fc5a97bba6b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c@@ -2071,7 +2071,8 @@ unsigned long get_wchan(struct task_struct *p)void enqueue_task(struct rq *rq, struct task_struct *p, int flags) { - trace_enqueue_task_tp(rq->cpu, p); + if (!p->se.sched_delayed || !move_entity(flags)) + trace_enqueue_task_tp(rq->cpu, p); if (!(flags & ENQUEUE_NOCLOCK)) update_rq_clock(rq);diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b173a059315c..1e2a636d6804 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c@@ -5444,7 +5444,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, structsched_entity *se, int flags) * put back on, and if we advance min_vruntime, we'll be placed back * further than we started -- i.e. we'll be penalized. */ - if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) != DEQUEUE_SAVE) + if (move_entity(flags)) update_min_vruntime(cfs_rq); if (flags & DEQUEUE_DELAYED)@@ -7054,6 +7054,7 @@ static int dequeue_entities(struct rq *rq,struct sched_entity *se, int flags) /* Fix-up what dequeue_task_fair() skipped */ hrtick_update(rq); + trace_dequeue_task_tp(rq->cpu, p); /* * Fix-up what block_task() skipped.diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 7936d4333731..33897d35744a 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c@@ -1196,19 +1196,6 @@ void dec_rt_tasks(struct sched_rt_entity*rt_se, struct rt_rq *rt_rq) dec_rt_group(rt_se, rt_rq); } -/* - * Change rt_se->run_list location unless SAVE && !MOVE - * - * assumes ENQUEUE/DEQUEUE flags match - */ -static inline bool move_entity(unsigned int flags) -{ - if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE) - return false; - - return true; -} - static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct rt_prio_array *array) { list_del_init(&rt_se->run_list);diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d3f33d10c58c..37730cd834ba 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h@@ -2361,6 +2361,20 @@ extern constu32 sched_prio_to_wmult[40]; #define RETRY_TASK ((void *)-1UL) +/* + * Checks for a SAVE/RESTORE without MOVE. Returns false if + * SAVE and !MOVE. + * + * Assumes ENQUEUE/DEQUEUE flags match. + */ +static inline bool move_entity(unsigned int flags) +{ + if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE) + return false; + + return true; +} + struct affinity_context { const struct cpumask *new_mask; struct cpumask *user_mask; --- Thoughts?