Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF

From: Andrii Nakryiko <hidden>
Date: 2019-06-03 21:55:07
Also in: bpf

On Sun, Jun 2, 2019 at 5:33 PM Jakub Kicinski
[off-list ref] wrote:

On Fri, 31 May 2019 15:58:41 -0700, Andrii Nakryiko wrote:

quoted

On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev [off-list ref] wrote:

quoted

On 05/31, Andrii Nakryiko wrote:

quoted

This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.

My 2c: this is too magical and relies on me knowing the expected fields.
(also, the compiler won't be able to help with the misspellings).

I have mixed feelings, too.  Especially the key and value fields are
very non-idiomatic for C :(  They never hold any value or data, while
the other fields do.  That feels so awkward.  I'm no compiler expert,
but even something like:

struct map_def {
        void *key_type_ref;
} mamap = {
        .key_type_ref = &(struct key_xyz){},
};

Would feel like less of a hack to me, and then map_def doesn't have to
be different for every map.  But yea, IDK if it's easy to (a) resolve
the type of what key_type points to, or (b) how to do this for scalar
types.

The syntax for scalar would be &(int){0}, that compiles.

But there are a bunch of things that make it infeasible. So let's take
an example and see what's happening:

/* huge struct */
struct custom {int a; int b; int c; int d[1000000];};

struct {
        void *key;
        void *value;
} new_map = {
        .key = &(int){0},
        .value = &(struct custom){},
};

If we dump BTF, here's what we get:

$ bpftool btf dump file tail_call_test.o
[1] FUNC_PROTO '(anon)' ret_type_id=2 vlen=0
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] FUNC 'main' type_id=1
[4] VAR '.compoundliteral' type_id=0, linkage=static
[5] VAR '.compoundliteral.1' type_id=0, linkage=static
[6] STRUCT '(anon)' size=24 vlen=3
        'type' type_id=2 bits_offset=0
        'key' type_id=7 bits_offset=64
        'value' type_id=7 bits_offset=128
[7] PTR '(anon)' type_id=0
[8] VAR 'new_map' type_id=6, linkage=global-alloc
[9] DATASEC '.bss' size=0 vlen=2
        type_id=4 offset=0 size=4
        type_id=5 offset=4 size=4000012
[10] DATASEC '.maps' size=0 vlen=1
        type_id=8 offset=0 size=24

So notice how we get two .bss entries, one for 4 bytes (for key, var
'compoundliteral') and another for 4MB (for huge struct, var
'.compoundliteral.1'). So while this won't increase the size of ELF,
it will force a huge .bss (and corresponding global data map) to be
created, which is no good.

Also, notice how there is no type information associated with [4] and
[5] vars, they are just of type void. There is no type information
about struct custom at all, though it might be (?) possible to fix it
by modifying compiler to preserve more type information.

So while the second one is a technical hurdle, which we might overcome
(not sure, actually), the issue with big .BSS is a showstopper for
some applications.

To eliminate .BSS issue, we'd need something like this to capture type
information:

struct {
        void *key;
        void *value;
} new_map = {
        .key = (int)0,
        .value = (struct custom *)0,
};

But that doesn't capture any type information for those type casts at
all, so more compiler work (if at all possible).

Which is why I think capturing type information using a standard
non-convoluted C way/syntax using a field declaration is the most
reliable, simple, and clean way. You do intialize key/value, it's just
a NULL pointer to corresponding type:

struct {
        int type;
        int *key;
        struct custom *value;
} new_map __attribute__((section(".maps"), used)) = {
        .type = 2,
        .key = (int)0,
        .value = (struct custom *)NULL,
};


Notice, btw, that this approach doesn't prevent you to re-use struct
definitions for multiple maps, if they have identical key/value types
or if you are not capturing type information at all.

struct my_typical_map {
        int type;
        int max_entries;
        u64 *key;
        struct custom *value;
};

struct my_typical_map map1 SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 10,
};

struct my_typical_map map2 SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 20,
};

Or, you can just re-use struct bpf_map_def today like this (but you
won't have type info for key/value, of course):

struct bpf_map_def my_map_without_type_info SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 100,
        .key_size = sizeof(u64),
        .value_size = sizeof(struct custom),
};

This approach gives you as much flexibility as possible, you only will
have to have different definition struct, if you have different
key/value type (in C++ that would be solved by templates, but alas we
are in C land).

quoted

I don't think it's really worse than current bpf_map_def approach. In
typical scenario, there are only two fields you need to remember: type
and max_entries (notice, they are called exactly the same as in
bpf_map_def, so this knowledge is transferrable). Then you'll have
key/value, using which you are describing both type (using field's
type) and size (calculated from the type).

I can relate a bit to that with bpf_map_def you can find definition
and see all possible fields, but one can also find a lot of examples
for new map definitions as well.

One big advantage of this scheme, though, is that you get that type
association automagically without using BPF_ANNOTATE_KV_PAIR hack,
with no chance of having a mismatch, etc. This is less duplication (no
need to do sizeof(struct my_struct) and struct my_struct as an arg to
that macro) and there is no need to go and ping people to add those
annotations to improve introspection of BPF maps.

quoted

Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. Old
libbpf implementation will ignore fields it doesn't recognize, while new
implementations will parse and recognize new optional attributes.

I also don't know how to feel about old libbpf ignoring some attributes.
In the kernel we require that the unknown fields are zeroed.
We probably need to do something like that here? What do you think
would be a good example of an optional attribute?

Ignoring is required for forward-compatibility, where old libbpf will
be used to load newer user BPF programs. We can decided not to do it,
in that case it's just a question of erroring out on first unknown
field. This RFC was posted exactly to discuss all these issues with
more general community, as there is no single true way to do this.

As for examples of when it can be used. It's any feature that can be
considered optional or a hint, so if old libbpf doesn't do that, it's
still not the end of the world (and we can live with that, or can
correct using direct libbpf API calls).

On forward compatibility my 0.02c would be - if we want to go there
and silently ignore fields it'd be good to have some form of "hard
required" bit.  For TLVs ABIs it can be a "you have to understand
this one" bit, for libbpf perhaps we could add a "min libbpf version
required" section?  That kind of ties us ELF formats to libbpf
specifics (the libbpf version presumably would imply support for
features), but I think we want to go there, anyway.

I think we can go with strict/non-strict mode, which we already
support in libbpf with MAPS_RELAX_COMPAT flag (see
__bpf_object__open_xattr), would that work?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help