Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
From: Leonardo Brás <hidden>
Date: 2023-05-10 20:28:30
Also in:
lkml
On Thu, 2023-05-04 at 12:59 +0100, Valentin Schneider wrote:
quoted hunk ↗ jump to hunk
+Daniel On 03/05/23 16:59, Peter Zijlstra wrote:quoted
On Wed, Apr 19, 2023 at 12:45:08AM -0300, Leonardo Brás wrote:quoted
IIUC the last commits add tracepoints that are collected in the requesting CPU, at the moment of scheduling the IPI, which are also useful in some scenarios. On my scenario, it could help a little, since it makes possible to filter what all other cpus are scheduling on the requested cpu. OTOH it could be also be misleading, as the requested cpu could be running something that was scheduled way before. The change I propose does exactly what my scenario need: track exactly which function was running at given time in the requested CPU. With this info, we can check which (if any) remotely requested function was running on given time window.I was thinking you could simply (graph)-trace __flush_smp_call_function_queue() with a max_graph_depth or so (Steve says that ought to work). But even that might be too specific, your use case sounds more like what we have the irq-off latency tracer for, and that thing will immediately tell you what functions were being ran.quoted
(An unrelated thing I just thought: We could even use the commits you pointed together with my proposed change in order to measure how long does it take for a requested function to run / complete in the requested cpu)I don't think you could actually do that; the send tracepoints Valentin added don't log the csd address, this means you cannot distinguish two CSDs with the same function send from different CPUs. To do this you'd need to add the csd address to the the ipi_send tracepoints and your own (possibly replacing info -- because I don't think that's too useful). Valentin -- is any of this something you'd also find useful?Conceptually yeah, however: With isolcpus + NOHZ_FULL, *any* IPI sent to an isolated CPU is game over, you interrupt the userspace task and you eat the full meal that is NOHZ_FULL kernel entry. Pretty much any such IPI will show up when evaluating your setup with rteval/rlta/whatever, so having an event at IPI reception is redundant. IIUC Leonardo's looking at a setup with isolcpus but not necessarily NOHZ_FULL, so he wants to look at *which* IPIs (if any) are pushing the isolated task over its deadline/period. I would argue any IPI received there is at risk of doing that, so it's the same fight as with NOHZ_FULL. With that said, I suppose this could still be helpful for e.g. osnoise to hook into and point the finger at which CPU/context sent the problematic IPI. Or more generally, as Leonardo suggested, to measure CSD IPI delivery times. One thing though is that trace_ipi_send_cpu*() is not used solely for CSD's, cf. irq_work_raise() or smp_send_reschedule(). We might want to split that into e.g. trace_csd_queue_cpu*() + trace_ipi_send*(). Something like so... ---diff --git a/include/trace/events/smp.h b/include/trace/events/smp.h new file mode 100644 index 0000000000000..8fc725a2b45b9 --- /dev/null +++ b/include/trace/events/smp.h@@ -0,0 +1,69 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM smp + +#if !defined(_TRACE_SMP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_SMP_H + +#include <linux/tracepoint.h> + +TRACE_EVENT(csd_queue_cpu, + + TP_PROTO(const unsigned int cpu, + unsigned long callsite, + smp_call_func_t func, + call_single_data_t *csd), + + TP_ARGS(cpu, callsite, func, csd), + + TP_STRUCT__entry( + __field(unsigned int, cpu) + __field(void *, callsite) + __field(void *, func) + __field(void *, csd) + ), + + TP_fast_assign( + __entry->cpu = cpu; + __entry->callsite = (void *)callsite; + __entry->func = func; + __entry->csd = csd; + ), + + TP_printk("cpu=%u callsite=%pS func=%pS csd=%p", + __entry->cpu, __entry->callsite, __entry->func, __entry->csd) +);
This is for the caller side, right?
+
+DECLARE_EVENT_CLASS(csd_function,
+
+ TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
+
+ TP_ARGS(func, csd),
+
+ TP_STRUCT__entry(
+ __field(void *, func)
+ __field(void *, csd)
+ ),
+
+ TP_fast_assign(
+ __entry->func = func;
+ __entry->csd = csd;
+ ),
+
+ TP_printk("func=%pS csd=%p", __entry->func, __entry->csd)
+);
+
+DEFINE_EVENT(csd_function, csd_function_entry,
+ TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
+ TP_ARGS(func, csd)
+);
+
+DEFINE_EVENT(csd_function, csd_function_exit,
+ TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
+ TP_ARGS(func, csd)
+);Oh, this is what event_class is for. Thanks for the example :)
quoted hunk ↗ jump to hunk
+ +#endif /* _TRACE_SMP_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h>diff --git a/kernel/smp.c b/kernel/smp.c index ab3e5dad6cfe9..7d28db303e9bc 100644 --- a/kernel/smp.c +++ b/kernel/smp.c@@ -27,6 +27,9 @@ #include <linux/jump_label.h> #include <trace/events/ipi.h> +#define CREATE_TRACE_POINTS +#include <trace/events/smp.h> +#undef CREATE_TRACE_POINTS #include "smpboot.h" #include "sched/smp.h"@@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask) arch_send_call_function_ipi_mask(mask); } +static __always_inline void +csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd) +{ + trace_csd_function_entry(func, csd); + func(info); + trace_csd_function_exit(func, csd); +} +
Good one, a helper to avoid calling those traces everywhere. Honest question: Since info == csd->info and func == csd->func, we could just pass csd, right? I suppose the suggestion on the 3-argument version is to use the values already fetched from memory instead of fetching them again. Is that correct?
quoted hunk ↗ jump to hunk
#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG static DEFINE_STATIC_KEY_MAYBE(CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT, csdlock_debug_enabled);@@ -329,7 +340,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node) * even if we haven't sent the smp_call IPI yet (e.g. the stopper * executes migration_cpu_stop() on the remote CPU). */ - if (trace_ipi_send_cpu_enabled()) { + if (trace_csd_queue_cpu_enabled()) { call_single_data_t *csd; smp_call_func_t func;@@ -337,7 +348,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node) func = CSD_TYPE(csd) == CSD_TYPE_TTWU ? sched_ttwu_pending : csd->func; - trace_ipi_send_cpu(cpu, _RET_IP_, func); + trace_csd_queue_cpu(cpu, _RET_IP_, func, csd); } /*@@ -375,7 +386,7 @@ static int generic_exec_single(int cpu, struct __call_single_data *csd) csd_lock_record(csd); csd_unlock(csd); local_irq_save(flags); - func(info); + csd_do_func(func, info, NULL); csd_lock_record(NULL); local_irq_restore(flags); return 0;@@ -477,7 +488,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline) } csd_lock_record(csd); - func(info); + csd_do_func(func, info, csd); csd_unlock(csd); csd_lock_record(NULL); } else {@@ -508,7 +519,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline) csd_lock_record(csd); csd_unlock(csd); - func(info); + csd_do_func(func, info, csd); csd_lock_record(NULL); } else if (type == CSD_TYPE_IRQ_WORK) { irq_work_single(csd);@@ -522,8 +533,10 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline) /* * Third; only CSD_TYPE_TTWU is left, issue those. */ - if (entry) - sched_ttwu_pending(entry); + if (entry) { + csd = llist_entry(entry, typeof(*csd), node.llist); + csd_do_func(sched_ttwu_pending, entry, csd); + } }@@ -624,7 +637,7 @@ EXPORT_SYMBOL(smp_call_function_single); /** * smp_call_function_single_async() - Run an asynchronous function on a - * specific CPU. + * specific CPU. * @cpu: The CPU to run on. * @csd: Pre-allocated and setup data structure *@@ -728,7 +741,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask, int cpu, last_cpu, this_cpu = smp_processor_id(); struct call_function_data *cfd; bool wait = scf_flags & SCF_WAIT; - int nr_cpus = 0, nr_queued = 0; + int nr_cpus = 0; bool run_remote = false; bool run_local = false;@@ -786,21 +799,16 @@ static void smp_call_function_many_cond(const struct cpumask *mask, csd->node.src = smp_processor_id(); csd->node.dst = cpu; #endif + + trace_csd_queue_cpu(cpu, _RET_IP_, func, csd); + if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) { __cpumask_set_cpu(cpu, cfd->cpumask_ipi); nr_cpus++; last_cpu = cpu; } - nr_queued++; } - /* - * Trace each smp_function_call_*() as an IPI, actual IPIs - * will be traced with func==generic_smp_call_function_single_ipi(). - */ - if (nr_queued) - trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func); - /* * Choose the most efficient way to send an IPI. Note that the * number of CPUs might be zero due to concurrent changes to the@@ -816,7 +824,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask, unsigned long flags; local_irq_save(flags); - func(info); + csd_do_func(func, info, NULL); local_irq_restore(flags); }
I will rebase my patch on top of tip/smp/core and apply the suggested changes. A v2 will be send shortly. Thank you all for reviewing! Leo