Re: [RFC PATCH v6 3/5] sched, tracing: reorganize fields of switch event struct
From: Ze Gao <hidden>
Date: 2023-08-03 12:54:33
Also in:
linux-perf-users, lkml
On Thu, Aug 3, 2023 at 5:18 PM Steven Rostedt [off-list ref] wrote:
On Thu, 3 Aug 2023 04:33:50 -0400 Ze Gao [off-list ref] wrote:quoted
Report prioritiy and prev_state in 'short' to save some buffer space. And also reorder the fields so that we take struct alignment into consideration to make the record compact.If I were to write this, I would have wrote: The prev_state field in the sched_switch event is assigned by __trace_sched_switch_state(). The largest number that function will return is TASK_REPORT_MAX which is just 0x100. There's no reason that the prev_state field is a full 32 bits when it is using just 9 bits max. In order to save space on the ring buffer, shrink the prev_state to 16 bits (short). Also, change the positions of the other fields to accommodate the short value of prev_state to eliminate any holes that were created in the structure. See the difference?quoted
#ifdef CREATE_TRACE_POINTS -static inline long __trace_sched_switch_state(bool preempt, +static inline short __trace_sched_switch_state(bool preempt, unsigned int prev_state, struct task_struct *p) { unsigned int state; #ifdef CONFIG_SCHED_DEBUG - BUG_ON(p != current); + WARN_ON_ONCE(p != current); #endif /* CONFIG_SCHED_DEBUG */The above needs to be a separate patch.
I've moved this to a new patch, and this is the changelog:
sched, tracing: change BUG_ON to WARN_ON_ONCE in __trace_sched_switch_state
BUG_ON() was introduced in 2014 and old, and we
switch it to WARN_ON_ONCE() to not to crash the
kernel when the sched-out task is unexpected than
the current, as suggested by Steven.
Signed-off-by: Ze Gao [off-list ref]
Regards,
Ze
quoted
/*@@ -229,23 +229,23 @@ TRACE_EVENT(sched_switch, TP_ARGS(preempt, prev, next, prev_state), TP_STRUCT__entry( - __array( char, prev_comm, TASK_COMM_LEN ) __field( pid_t, prev_pid ) - __field( int, prev_prio ) - __field( long, prev_state ) - __array( char, next_comm, TASK_COMM_LEN ) __field( pid_t, next_pid ) - __field( int, next_prio ) + __field( short, prev_prio ) + __field( short, next_prio ) + __array( char, prev_comm, TASK_COMM_LEN ) + __array( char, next_comm, TASK_COMM_LEN ) + __field( short, prev_state ) ), TP_fast_assign( - memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); - __entry->prev_pid = prev->pid; - __entry->prev_prio = prev->prio; - __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev); + __entry->prev_pid = prev->pid; + __entry->next_pid = next->pid; + __entry->prev_prio = (short) prev->prio; + __entry->next_prio = (short) next->prio; memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); - __entry->next_pid = next->pid; - __entry->next_prio = next->prio; + memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); + __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev); /* XXX SCHED_DEADLINE */ ),