Thread (15 messages) 15 messages, 2 authors, 2025-02-11

Re: [PATCH v1 03/11] sched: Add sched tracepoints for RV task model

From: Gabriele Monaco <gmonaco@redhat.com>
Date: 2025-02-11 12:54:50
Also in: lkml


On Tue, 2025-02-11 at 12:03 +0100, Peter Zijlstra wrote:
On Tue, Feb 11, 2025 at 08:46:10AM +0100, Gabriele Monaco wrote:
quoted
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9632e3318e0d6..9ff0658095240 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -46,6 +46,7 @@
 #include <linux/rv.h>
 #include <linux/livepatch_sched.h>
 #include <linux/uidgid_types.h>
+#include <linux/tracepoint-defs.h>
 #include <asm/kmap_size.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
@@ -186,6 +187,12 @@ struct user_event_mm;
 # define debug_rtlock_wait_restore_state() do { } while (0)
 #endif
 
+#define trace_set_current_state(state_value)                     \
+ do {                                                     \
+ if (tracepoint_enabled(sched_set_state_tp))      \
+ do_trace_set_current_state(state_value); \
+ } while (0)
Yeah, this is much nicer, thanks!
quoted
 /*
  * set_current_state() includes a barrier so that the write of
current->__state
  * is correctly serialised wrt the caller's subsequent test of
whether to
@@ -226,12 +233,14 @@ struct user_event_mm;
 #define __set_current_state(state_value) \
  do { \
  debug_normal_state_change((state_value)); \
+ trace_set_current_state(state_value); \
  WRITE_ONCE(current->__state, (state_value)); \
  } while (0)
 
 #define set_current_state(state_value) \
  do { \
  debug_normal_state_change((state_value)); \
+ trace_set_current_state(state_value); \
  smp_store_mb(current->__state, (state_value)); \
  } while (0)
 
@@ -247,6 +256,7 @@ struct user_event_mm;
  \
  raw_spin_lock_irqsave(&current->pi_lock, flags); \
  debug_special_state_change((state_value)); \
+ trace_set_current_state(state_value); \
  WRITE_ONCE(current->__state, (state_value)); \
  raw_spin_unlock_irqrestore(&current->pi_lock, flags); \
  } while (0)
@@ -282,6 +292,7 @@ struct user_event_mm;
  raw_spin_lock(&current->pi_lock); \
  current->saved_state = current->__state; \
  debug_rtlock_wait_set_state(); \
+ trace_set_current_state(TASK_RTLOCK_WAIT); \
  WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT); \
  raw_spin_unlock(&current->pi_lock); \
  } while (0);
@@ -291,6 +302,7 @@ struct user_event_mm;
  lockdep_assert_irqs_disabled(); \
  raw_spin_lock(&current->pi_lock); \
  debug_rtlock_wait_restore_state(); \
+ trace_set_current_state(TASK_RUNNING); \
  WRITE_ONCE(current->__state, current->saved_state); \
  current->saved_state = TASK_RUNNING; \
  raw_spin_unlock(&current->pi_lock); \
quoted
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 165c90ba64ea9..9e33728bb57e8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -491,6 +491,15 @@ sched_core_dequeue(struct rq *rq, struct
task_struct *p, int flags) { }
 
 #endif /* CONFIG_SCHED_CORE */
 
+/* need a wrapper since we may need to trace from modules */
+EXPORT_TRACEPOINT_SYMBOL(sched_set_state_tp);
_GPL
quoted
+
+void do_trace_set_current_state(int state_value)
+{
+ trace_sched_set_state_tp(current, current->__state, state_value);
Should this be:

 __do_trace_sched_set_state_tp() ?
Mmh, you mean avoiding the static_branch_unlikely in the helper
function, since it's supposed to be used before calling it?

I checked all references of tracepoint-defs.h's tracepoint_enabled (the
macro actually doing this static_branch_unlikely) and it seems they
never use the __do_trace_ function but only call trace_, in fact
repeating the branch.

Your suggestion seems definitely cleaner but I wonder if that can have
unwanted consequences (we are exposing a function unconditionally
calling the tracepoint) and for that reason it was preferred not to do
it in all other occurrences?

I'm not sure if saving one branch would justify the change only in this
case. Thoughts?
quoted
+}
+EXPORT_SYMBOL(do_trace_set_current_state);
_GPL
I'm absolutely not against this change but, out of curiosity, would
that imply non-GPL modules are not going to be able to sleep going
forward? At least not using this pattern.

Thanks,
Gabriele
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help