Thread (31 messages) 31 messages, 11 authors, 2020-05-14

Re: [PATCH bpf-next 1/6] bpf: implement BPF ring buffer and verifier support for it

From: Andrii Nakryiko <hidden>
Date: 2020-05-14 05:59:18
Also in: bpf

On Wed, May 13, 2020 at 2:59 PM Alan Maguire [off-list ref] wrote:
On Wed, 13 May 2020, Andrii Nakryiko wrote:
quoted
This commits adds a new MPSC ring buffer implementation into BPF ecosystem,
which allows multiple CPUs to submit data to a single shared ring buffer. On
the consumption side, only single consumer is assumed.

Motivation
----------
There are two distinctive motivators for this work, which are not satisfied by
existing perf buffer, which prompted creation of a new ring buffer
implementation.
  - more efficient memory utilization by sharing ring buffer across CPUs;
  - preserving ordering of events that happen sequentially in time, even
  across multiple CPUs (e.g., fork/exec/exit events for a task).

These two problems are independent, but perf buffer fails to satisfy both.
Both are a result of a choice to have per-CPU perf ring buffer.  Both can be
also solved by having an MPSC implementation of ring buffer. The ordering
problem could technically be solved for perf buffer with some in-kernel
counting, but given the first one requires an MPSC buffer, the same solution
would solve the second problem automatically.
This looks great Andrii! One potentially interesting side-effect of
the way this is implemented is that it could (I think) support speculative
tracing.

Say I want to record some tracing info when I enter function foo(), but
I only care about cases where that function later returns an error value.
I _think_ your implementation could support that via a scheme like
this:

- attach a kprobe program to record the data via bpf_ringbuf_reserve(),
  and store the reserved pointer value in a per-task keyed hashmap.
  Then record the values of interest in the reserved space. This is our
  speculative data as we don't know whether we want to commit it yet.

- attach a kretprobe program that picks up our reserved pointer and
  commit()s or discard()s the associated data based on the return value.

- the consumer should (I think) then only read the committed data, so in
  this case just the data of interest associated with the failure case.

I'm curious if that sort of ringbuf access pattern across multiple
programs would work? Thanks!

Right now it's not allowed. Similar to spin lock and socket reference,
verifier will enforce that reserved record is committed or discarded
within the same BPF program invocation. Technically, nothing prevents
us from relaxing this and allowing to store this pointer in a map, but
that's probably way too dangerous and not necessary for most common
cases.

But all your troubles with this is due to using a pair of
kprobe+kretprobe. What I think should solve your problem is a single
fexit program. It can read input arguments *and* return value of
traced function. So there won't be any need for additional map and
storing speculative data (and no speculation as well, because you'll
just know beforehand if you even need to capture data). Does this work
for your case?
Alan
[...]

no one seems to like trimming emails ;)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help