Thread (31 messages) 31 messages, 11 authors, 2020-05-14

Re: [PATCH bpf-next 0/6] BPF ring buffer

From: Jonathan Lemon <hidden>
Date: 2020-05-14 16:37:23
Also in: bpf

On Wed, May 13, 2020 at 11:08:46PM -0700, Andrii Nakryiko wrote:
On Wed, May 13, 2020 at 3:49 PM Jonathan Lemon [off-list ref] wrote:
quoted
On Wed, May 13, 2020 at 12:25:26PM -0700, Andrii Nakryiko wrote:
quoted
Implement a new BPF ring buffer, as presented at BPF virtual conference ([0]).
It presents an alternative to perf buffer, following its semantics closely,
but allowing sharing same instance of ring buffer across multiple CPUs
efficiently.

Most patches have extensive commentary explaining various aspects, so I'll
keep cover letter short. Overall structure of the patch set:
- patch #1 adds BPF ring buffer implementation to kernel and necessary
  verifier support;
- patch #2 adds litmus tests validating all the memory orderings and locking
  is correct;
- patch #3 is an optional patch that generalizes verifier's reference tracking
  machinery to capture type of reference;
- patch #4 adds libbpf consumer implementation for BPF ringbuf;
- path #5 adds selftest, both for single BPF ring buf use case, as well as
  using it with array/hash of maps;
- patch #6 adds extensive benchmarks and provide some analysis in commit
  message, it build upon selftests/bpf's bench runner.

  [0] https://docs.google.com/presentation/d/18ITdg77Bj6YDOH2LghxrnFxiPWe0fAqcmJY95t_qr0w

Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Jonathan Lemon <redacted>
Looks very nice!  A few random questions:

1) Why not use a structure for the header, instead of 2 32bit ints?
hm... no reason, just never occurred to me it's necessary :)
It might be clearer to do this.  Something like:

struct ringbuf_record {
    union {
        struct {
            u32 size:30;
            bool busy:1;
            bool discard:1;
        };
        u32 word1;
    };
    union {
        u32 pgoff;
        u32 word2;
    };
};

While perhaps a bit overkill, makes it clear what is going on.

quoted
2) Would it make sense to reserve X bytes, but only commit Y?
   the offset field could be used to write the record length.

   E.g.:
      reserve 512 bytes    [BUSYBIT | 512][PG OFFSET]
      commit  400 bytes    [ 512 ] [ 400 ]
It could be done, though I had tentative plans to use those second 4
bytes for something useful eventually.

But what's the use case? From ring buffer's perspective, X bytes were
reserved and are gone already and subsequent writers might have
already advanced producer counter with the assumption that all X bytes
are going to be used. So there are no space savings, even if record is
discarded or only portion of it is submitted. I can only see a bit of
added convenience for an application, because it doesn't have to track
amount of actual data in its record. But this doesn't seem to be a
common case either, so not sure how it's worth supporting... Is there
a particular case where this is extremely useful and extra 4 bytes in
record payload is too much?
Not off the top of my head - it was just the first thing that came to
mind when reading about the commit/discard paradigm.  I was thinking
about variable records, where the maximum is reserved, but less data
is written.  But there's no particular reason for the ringbuffer to
track this either, it could be part of the application framing.

quoted
3) Why have 2 separate pages for producer/consumer, instead of
   just aligning to a smp cache line (or even 1/2 page?)
Access rights restrictions. Consumer page is readable/writable,
producer page is read-only for user-space. If user-space had ability
to write producer position, it could wreck a huge havoc for the
ringbuf algorithm.
Ah, thanks, that makes sense.  Might want to add a comment to
that effect, as it's different from other implementations.
-- 
Jonathan
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help