Thread (50 messages) 50 messages, 6 authors, 2019-06-18

Re: [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use

From: Kris Van Hees <hidden>
Date: 2019-05-24 05:11:16
Also in: bpf, lkml

On Thu, May 23, 2019 at 07:02:43PM -0400, Steven Rostedt wrote:
On Thu, 23 May 2019 14:13:31 -0700
Alexei Starovoitov [off-list ref] wrote:
quoted
quoted
In DTrace, people write scripts based on UAPI-style interfaces and they don't
have to concern themselves with e.g. knowing how to get the value of the 3rd
argument that was passed by the firing probe.  All they need to know is that
the probe will have a 3rd argument, and that the 3rd argument to *any* probe
can be accessed as 'arg2' (or args[2] for typed arguments, if the provider is
capable of providing that).  Different probes have different ways of passing
arguments, and only the provider code for each probe type needs to know how
to retrieve the argument values.

Does this help bring clarity to the reasons why an abstract (generic) probe
concept is part of DTrace's design?  
It actually sounds worse than I thought.
If dtrace script reads some kernel field it's considered to be uapi?! ouch.
It means dtrace development philosophy is incompatible with the linux kernel.
There is no way kernel is going to bend itself to make dtrace scripts
runnable if that means that all dtrace accessible fields become uapi.
Now from what I'm reading, it seams that the Dtrace layer may be
abstracting out fields from the kernel. This is actually something I
have been thinking about to solve the "tracepoint abi" issue. There's
usually basic ideas that happen. An interrupt goes off, there's a
handler, etc. We could abstract that out that we trace when an
interrupt goes off and the handler happens, and record the vector
number, and/or what device it was for. We have tracepoints in the
kernel that do this, but they do depend a bit on the implementation.
Now, if we could get a layer that abstracts this information away from
the implementation, then I think that's a *good* thing.
This is indeed what DTrace uses.  When a probe triggers (be it kprobe, network
event, tracepoint, etc), the core execution component is invoked with a probe
id, and a set of data items.  In its current implementation (not BPF based),
the probe triggers which causes a probe type specific handler to be called in
the provider module for that probe type.  The handler determines the probe id
(e.g. for a kprobe that might be based on the program counter value), and it
also prepares the list of data items (which we call arguments to the probe).
It then calls the execution component with the probe id and arguments.

All probe types are handled by a provider, and each provider has a handler
that determines the probe id and arguments, and then calls the execution
component.  So, at the level of the execution component all probes look the
same.

Scripts commonly operate on the abstract probe, but scriptr writers can opt
to do more fancy things that do depend on probe implementation details.  In
that case, there is of course no guarantee that the script will keep working
as kernel releases change.
quoted
In stark contrast to dtrace all of bpf tracing scripts (bcc scripts
and bpftrace scripts) are written for specific kernel with intimate
knowledge of kernel details. They do break all the time when kernel changes.
kprobe and tracepoints are NOT uapi. All of them can change.
tracepoints are a bit more stable than kprobes, but they are not uapi.
I wish that was totally true, but tracepoints *can* be an abi. I had
code reverted because powertop required one to be a specific format. To
this day, the wakeup event has a "success" field that writes in a
hardcoded "1", because there's tools that depend on it, and they only
work if there's a success field and the value is 1.

I do definitely agree with you that the Dtrace code shall *never* keep
the kernel from changing. That is, if Dtrace depends on something that
changes (let's say we record priority of a task, but someday priority
is replaced by something else), then Dtrace must cope with it. It must
not be a blocker like user space applications can be.
I fully agree that DTrace or any other tool should never prevent changes from
happening at the kernel level.  Even in its current (non-BPF) implementation
it has had to cope with changes.  The abstraction through the providers has
been a real benefit for that because changes to probe mechanisms can be dealt
with at the level of the providers, and everything else can remain the same
because the abstraction "hides" the implementation details.

	Kris
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help