Thread (35 messages) 35 messages, 4 authors, 2023-08-23

Re: [RFC PATCH v6 1/5] perf sched: sync state char array with the kernel

From: Ze Gao <hidden>
Date: 2023-08-04 03:19:51
Also in: linux-perf-users, lkml

On Fri, Aug 4, 2023 at 10:38 AM Ze Gao [off-list ref] wrote:
On Fri, Aug 4, 2023 at 10:21 AM Ze Gao [off-list ref] wrote:
quoted
On Thu, Aug 3, 2023 at 11:10 PM Steven Rostedt [off-list ref] wrote:
quoted
On Thu,  3 Aug 2023 04:33:48 -0400
Ze Gao [off-list ref] wrote:
quoted
Update state char array and then remove unused and stale
macros, which are kernel internal representations and not
encouraged to use anymore.

Signed-off-by: Ze Gao <redacted>
---
 tools/perf/builtin-sched.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 9ab300b6f131..8dc8f071721c 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -92,23 +92,12 @@ struct sched_atom {
      struct task_desc        *wakee;
 };

-#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
+#define TASK_STATE_TO_CHAR_STR "RSDTtXZPI"
Thinking about this more, this will always be wrong. Changing it just works
for the kernel you made the change for, but if it is run on another kernel,
it's broken again.
Indeed. There is no easy way to maintain backward compatibility unless
we stop using this bizarre 'prev_state' field. Basically all its users suffer
from this. That's why I believe this needs a fix to alert people does not
use 'prev_state' anymore.
quoted
I actually wrote code once that basically just did a:

        struct trace_seq s;

        trace_seq_init(&s);
        tep_print_event(tep, &s, record, "%s", TEP_PRINT_INFO);

then searched s.buffer for "prev_state=%s ", to find the state character.

That's because the kernel should always be up to date (and why I said I
needed that string in the print_fmt).
Turing to building the state char array from print fmt string dynamically
is a great idea. :)
I realize this is not perfect as well after second thoughts, since this does not
take offline use of perf into consideration.  People might run perf on different
machines than where the perf.data gets recorded, in which way what we get
from  /sys/kernel/debug/tracing/events/sched/sched_switch/format is likely
different from the perf.data.

So let's parse it from TEP_PRINT_INFO of each record instead of building
the state char array and rely on 'prev_state' again. At least this fix all tools
that have TEP_PRINT_INFO available.

Thanks,
Ze


quoted
quoted
As perf has a tep handle, this could be a helper function to extract the
state if needed, and get rind of relying on the above character array.
I'll figure out how to make it happen.

BTW,  my last concern is that is there any better way to notice userspace to
avoid interpreting task state out of 'prev_state'. Because the awkward thing
happens again.
By userspace, I mean all tools consume 'prev_state' but don't have print fmt
available, taking bpf tracepoint for example.

Regards,
Ze
quoted
Thanks,
Ze
quoted
-- Steve

quoted
 /* task state bitmask, copied from include/linux/sched.h */
 #define TASK_RUNNING         0
 #define TASK_INTERRUPTIBLE   1
 #define TASK_UNINTERRUPTIBLE 2
-#define __TASK_STOPPED               4
-#define __TASK_TRACED                8
-/* in tsk->exit_state */
-#define EXIT_DEAD            16
-#define EXIT_ZOMBIE          32
-#define EXIT_TRACE           (EXIT_ZOMBIE | EXIT_DEAD)
-/* in tsk->state again */
-#define TASK_DEAD            64
-#define TASK_WAKEKILL                128
-#define TASK_WAKING          256
-#define TASK_PARKED          512

 enum thread_state {
      THREAD_SLEEPING = 0,
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help