Re: [PATCH v2] tracing: Add system trigger file to enable triggers for all the system's events
From: Steven Rostedt <rostedt@goodmis.org>
Date: 2025-11-26 14:38:43
Also in:
lkml
On Wed, 26 Nov 2025 15:02:51 +0900 Masami Hiramatsu (Google) [off-list ref] wrote:
quoted
This also allows to remove a trigger from all events in a subsystem (even if it's not a subsystem trigger!).I have some comments below.
BTW, it's more appropriate to simply trim the email ;-)
quoted
--- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c@@ -2168,51 +2168,52 @@ event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt, static LIST_HEAD(event_subsystems); -static int subsystem_open(struct inode *inode, struct file *filp) +struct trace_subsystem_dir *trace_get_system_dir(struct inode *inode) { - struct trace_subsystem_dir *dir = NULL, *iter_dir; - struct trace_array *tr = NULL, *iter_tr; - struct event_subsystem *system = NULL; - int ret; + struct trace_subsystem_dir *dir; + struct trace_array *tr = NULL;nit: This also no need to be initialized.
Hmm, I guess this was needed in one of the versions I had before posting. I'll fix in v3.
quoted
- if (tracing_is_disabled()) - return -ENODEV; + guard(mutex)(&event_mutex); + guard(mutex)(&trace_types_lock); /* Make sure the system still exists */ - mutex_lock(&event_mutex); - mutex_lock(&trace_types_lock); - list_for_each_entry(iter_tr, &ftrace_trace_arrays, list) { - list_for_each_entry(iter_dir, &iter_tr->systems, list) { - if (iter_dir == inode->i_private) { + list_for_each_entry(tr, &ftrace_trace_arrays, list) { + list_for_each_entry(dir, &tr->systems, list) { + if (dir == inode->i_private) { /* Don't open systems with no events */ - tr = iter_tr; - dir = iter_dir; - if (dir->nr_events) { - __get_system_dir(dir); - system = dir->subsystem; - } - goto exit_loop; + if (!dir->nr_events) + return NULL; + if (__trace_array_get(tr) < 0) + return NULL; + __get_system_dir(dir); + return dir; } } }
quoted
static ssize_t event_trigger_regex_write(struct file *file, const char __user *ubuf, size_t cnt, loff_t *ppos) { struct trace_event_file *event_file; ssize_t ret; - char *buf __free(kfree) = NULL; + char *buf __free(kfree) = get_user_buf(ubuf, cnt); - if (!cnt) + if (!buf) return 0; - if (cnt >= PAGE_SIZE) - return -EINVAL; - - buf = memdup_user_nul(ubuf, cnt); if (IS_ERR(buf)) return PTR_ERR(buf);You can simply write: if (IS_ERR_OR_NULL(buf)) return PTR_ERR_OR_ZERO(buf);
Yes I can. But honestly, the above is much harder to understand what is happening than the code I had written. I mean: if (!buf) return 0; if (IS_ERR(buf)) return PTR_ERR(buf); is pretty obvious of what is happening. if (IS_ERR_OR_NULL(buf)) return PTR_ERR_OR_ZERO(buf); Is quite a bit more obfuscated. I mean, I needed to look up what PTR_ERR_OR_ZERO() did to be sure I knew what was returned.
quoted
+ list_for_each_entry(file, &tr->events, list) { + + if (strcmp(system->name, file->event_call->class->system) != 0) + continue; + + ret = p->parse(p, file, buff, command, next); + + /* Removals and existing events do not error */ + if (ret < 0 && ret != -EEXIST && !remove) { + pr_warn("Failed adding trigger %s on %s\n", + command, trace_event_name(file->event_call)); + }Can I expect that this can recover the previous settings via event trigger? e.g. # echo "stacktrace" > events/sched/sched_wakeup/trigger # echo "stacktrace" > events/sched/trigger # echo "!stacktrace" > events/sched/trigger # cat events/sched/sched_wakeup/trigger stacktrace:unlimited ?
No. In fact, this is one of the features of the system trigger. Writing into the system/trigger file is the same as writing into each of the system's event's trigger files one at a time. In fact, I updated the documentation in this patch to show that this file can be used to clear tiggers too! + echo snapshot > events/sched/sched_waking/trigger + cat events/sched/sched_waking/trigger + snapshot:unlimited + echo '!snapshot' > events/sched/trigger + cat events/sched/sched_waking/trigger + # Available triggers: + # traceon traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
quoted
+ } + return 0; +} + +static ssize_t +event_system_trigger_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + struct trace_subsystem_dir *dir = filp->private_data; + struct event_command *p; + char *command, *next; + char *buf __free(kfree) = get_user_buf(ubuf, cnt); + bool remove = false; + bool found = false; + ssize_t ret; + + if (!buf) + return 0; + + if (IS_ERR(buf)) + return PTR_ERR(buf);Ditto.
And ditto again with my reply ;-)
quoted
+ + /* system triggers are not allowed to have counters */ + if (strchr(buf, ':')) + return -EINVAL;':' is not always used for counters (e.g. hist) and it seems odd to check anything about parse here. Can we do this counter check after parse a command?quoted
+ + /* If opened for read too, dir is in the seq_file descriptor */ + if (filp->f_mode & FMODE_READ) { + struct seq_file *m = filp->private_data; + dir = m->private; + } + + /* Skip added space at beginning of buf */ + next = strim(buf); + + command = strsep(&next, " \t"); + if (next) { + next = skip_spaces(next); + if (!*next) + next = NULL; + }strim() removes both leading and trailing whitespace. So this check is not required.
But next here is not the one that had strim() attached to it. command = strsep(&next, " \t"); Updates the content of next. Thanks for the review, -- Steve