Thread (82 messages) 82 messages, 10 authors, 2018-02-13

Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

From: Steven Rostedt <rostedt@goodmis.org>
Date: 2018-02-03 19:02:24
Also in: lkml

On Sat, 3 Feb 2018 17:04:14 +0000 (UTC)
Mathieu Desnoyers [off-list ref] wrote:

The approach proposed here will introduce an expectation that internal
function signatures never change in the kernel, else it would break user-space
tools hooking on those functions.
I had this exact discussion with Linus. Linus, please correct me if I'm
wrong.

This is a case where he said if someone expected a function to be
there, than too bad. Functions can come and go depending on if gcc
inlines it or not. We already have this interface today. It's the
function tracer. One could argue a tool requires a function to exist
because it depends on a function being accessible to the function
tracer.
The instrumentation infrastructure provided by this patchset might be useful
for "one off" scripts, but it does not address the "stable instrumentation"
expectations issue.
Actually, it could work for adding a tracepoint.
The problem today is caused by widely used trace analysis tools that cannot
cope with changes to the kernel instrumentation, do not report the
instrumentation mismatch compared to their expectations, and we generally
don't expect users to ever update those tools to deal with newer kernels. Having
those tools hook on function names/arguments will not make this magically go
away. As soon as kernel code changes, widely used trace analysis tools will
start breaking left and right, and we will be back to square one. Only this time,
it's the internal function signature which will have become an ABI.
From those that were asking about having "trace markers" (ie.
Facebook), they told us they can cope with kernel changes.

If a user can't cope with the changes, then they need to have their own
custom kernels.
A possible solution to this problem appears if we start considering trace
analysis tools as just that: "tooling", with the following properties:

1) Tools need to validate that the instrumentation provided matches their
   expectations. This can be done by checking event/field names and/or version.
   Tools that fail to do that should be fixed.

2) Tools need to report to the user when the instrumentation does not match
   their expectations, and hint users to upgrade in order to deal with change.

3) Tools need to be backward compatible with respect to instrumentation: a
   user switching between older and newer kernels should not have to keep
   various copies of their tooling stack (graphical UI, analysis scripts,
   and so on).

If our goal is really to address this "stable instrumentation" issue, I don't
think hooking on functions helps in any way. I hope we can work on defining
instrumentation interface rules in order to deal with the fundamental problem
of requiring tooling to adapt to kernel changes.
I think you may have mistaken my goal. It was not to establish stable
instrumentation. In fact, it was the exact opposite. It was a way to
avoid stable instrumentation but still be able to add trace events.

The issue is that people are afraid to add tracepoints into their
subsystem because they are afraid that they will become stable and
limit their own development. The problem is that it hurts those that
want to trace these subsystems who are perfectly fine with the
tracepoints going away, and then they would need to change their tools.
This change set was to help those that can customize their tools with
new kernels. It was not to help those that just want their tools to
work with all kernels.

With that said, this actually can help those who want stable
infrastructure as well. If there happens to be a function that is
constantly used to create a dynamic function based event, it can then
be shown to ask the sub system maintainer to add a static tracepoint
there. As they can show that it is very useful to have.

One problem we are having today is that too many trace events are being
created, where there are a lot of them that have been used once and
never used again. And people don't care about them. I want to slow down
the addition of trace events if these function events can be used
instead. And when they are not good enough, or we see that one is
constantly being added, then we will know that we can add a trace event
that would be useful in the future.

-- Steve
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help