Re: [PATCH v1 03/11] sched: Add sched tracepoints for RV task model
From: Gabriele Monaco <gmonaco@redhat.com>
Date: 2025-02-11 12:54:50
Also in:
lkml
On Tue, 2025-02-11 at 12:03 +0100, Peter Zijlstra wrote:
On Tue, Feb 11, 2025 at 08:46:10AM +0100, Gabriele Monaco wrote:quoted
diff --git a/include/linux/sched.h b/include/linux/sched.h index 9632e3318e0d6..9ff0658095240 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h@@ -46,6 +46,7 @@#include <linux/rv.h> #include <linux/livepatch_sched.h> #include <linux/uidgid_types.h> +#include <linux/tracepoint-defs.h> #include <asm/kmap_size.h> /* task_struct member predeclarations (sorted alphabetically): */@@ -186,6 +187,12 @@ struct user_event_mm;# define debug_rtlock_wait_restore_state() do { } while (0) #endif +#define trace_set_current_state(state_value) \ + do { \ + if (tracepoint_enabled(sched_set_state_tp)) \ + do_trace_set_current_state(state_value); \ + } while (0)Yeah, this is much nicer, thanks!quoted
/* * set_current_state() includes a barrier so that the write of current->__state * is correctly serialised wrt the caller's subsequent test of whether to@@ -226,12 +233,14 @@ struct user_event_mm;#define __set_current_state(state_value) \ do { \ debug_normal_state_change((state_value)); \ + trace_set_current_state(state_value); \ WRITE_ONCE(current->__state, (state_value)); \ } while (0) #define set_current_state(state_value) \ do { \ debug_normal_state_change((state_value)); \ + trace_set_current_state(state_value); \ smp_store_mb(current->__state, (state_value)); \ } while (0)@@ -247,6 +256,7 @@ struct user_event_mm;\ raw_spin_lock_irqsave(¤t->pi_lock, flags); \ debug_special_state_change((state_value)); \ + trace_set_current_state(state_value); \ WRITE_ONCE(current->__state, (state_value)); \ raw_spin_unlock_irqrestore(¤t->pi_lock, flags); \ } while (0)@@ -282,6 +292,7 @@ struct user_event_mm;raw_spin_lock(¤t->pi_lock); \ current->saved_state = current->__state; \ debug_rtlock_wait_set_state(); \ + trace_set_current_state(TASK_RTLOCK_WAIT); \ WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT); \ raw_spin_unlock(¤t->pi_lock); \ } while (0);@@ -291,6 +302,7 @@ struct user_event_mm;lockdep_assert_irqs_disabled(); \ raw_spin_lock(¤t->pi_lock); \ debug_rtlock_wait_restore_state(); \ + trace_set_current_state(TASK_RUNNING); \ WRITE_ONCE(current->__state, current->saved_state); \ current->saved_state = TASK_RUNNING; \ raw_spin_unlock(¤t->pi_lock); \quoted
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 165c90ba64ea9..9e33728bb57e8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c@@ -491,6 +491,15 @@ sched_core_dequeue(struct rq *rq, structtask_struct *p, int flags) { } #endif /* CONFIG_SCHED_CORE */ +/* need a wrapper since we may need to trace from modules */ +EXPORT_TRACEPOINT_SYMBOL(sched_set_state_tp);_GPLquoted
+ +void do_trace_set_current_state(int state_value) +{ + trace_sched_set_state_tp(current, current->__state, state_value);Should this be: __do_trace_sched_set_state_tp() ?
Mmh, you mean avoiding the static_branch_unlikely in the helper function, since it's supposed to be used before calling it? I checked all references of tracepoint-defs.h's tracepoint_enabled (the macro actually doing this static_branch_unlikely) and it seems they never use the __do_trace_ function but only call trace_, in fact repeating the branch. Your suggestion seems definitely cleaner but I wonder if that can have unwanted consequences (we are exposing a function unconditionally calling the tracepoint) and for that reason it was preferred not to do it in all other occurrences? I'm not sure if saving one branch would justify the change only in this case. Thoughts?
quoted
+} +EXPORT_SYMBOL(do_trace_set_current_state);_GPL
I'm absolutely not against this change but, out of curiosity, would that imply non-GPL modules are not going to be able to sleep going forward? At least not using this pattern. Thanks, Gabriele