Re: trace_printk issue. Was: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov <hidden>
Date: 2019-10-04 19:57:18
Also in:
bpf, linux-api, linux-security-module
On 10/3/19 9:41 AM, Steven Rostedt wrote:
On Thu, 3 Oct 2019 09:18:40 -0700 Alexei Starovoitov [off-list ref] wrote:quoted
I think dropping last events is just as bad. Is there a mode to overwrite old and keep the last N (like perf does) ?Well, it drops it by pages. Thus you should always have the last page of events.quoted
Peter Wu brought this issue to my attention in commit 55c33dfbeb83 ("bpf: clarify when bpf_trace_printk discards lines"). And later sent similar doc fix to ftrace.rst.It was documented there, he just elaborated on it more: This file holds the output of the trace in a human readable format (described below). Note, tracing is temporarily - disabled while this file is being read (opened). + disabled when the file is open for reading. Once all readers + are closed, tracing is re-enabled.quoted
To be honest if I knew of this trace_printk quirk I would not have picked it as a debugging mechanism for bpf. I urge you to fix it.It's not a trivial fix by far. Note, trying to read the trace file without disabling the writes to it, will most likely make reading it when function tracing enabled totally garbage, as the buffer will most likely be filled for every read event. That is, each read event will not be related to the next event that is read, making it very confusing. Although, I may be able to make it work per page. That way you get at least a page worth of events.
That sounds much better. As long as trace_printk() doesn't disappear into the void, it's good. But the part I'm not getting is why trace_printk() has if (tracing_disabled) goto out; It's a concurrent ring buffer. One cpu can write into it while another reading. What is the point disabling trace_printk in particular? Each __buffer_unlock_commit is an atomic ring buffer update, so read from trace will either see it as a whole or won't see it. 'trace_pipe' clearly works fine. Why 'trace' is any different? Just keep tracing enabled and keep reading it until the end of current ring buffer. Whether open() determines current or it reads until next=0 is an implementation detail.