Thread (49 messages) 49 messages, 6 authors, 2020-11-04

Re: [PATCH bpf-next 08/11] libbpf: support BTF dedup of split BTFs

From: Song Liu <hidden>
Date: 2020-11-03 05:59:35
Also in: bpf

On Nov 2, 2020, at 9:25 PM, Andrii Nakryiko [off-list ref] wrote:

On Mon, Nov 2, 2020 at 6:49 PM Song Liu [off-list ref] wrote:
quoted

quoted
On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko [off-list ref] wrote:

Add support for deduplication split BTFs. When deduplicating split BTF, base
BTF is considered to be immutable and can't be modified or adjusted. 99% of
BTF deduplication logic is left intact (module some type numbering adjustments).
There are only two differences.

First, each type in base BTF gets hashed (expect VAR and DATASEC, of course,
those are always considered to be self-canonical instances) and added into
a table of canonical table candidates. Hashing is a shallow, fast operation,
so mostly eliminates the overhead of having entire base BTF to be a part of
BTF dedup.

Second difference is very critical and subtle. While deduplicating split BTF
types, it is possible to discover that one of immutable base BTF BTF_KIND_FWD
types can and should be resolved to a full STRUCT/UNION type from the split
BTF part.  This is, obviously, can't happen because we can't modify the base
BTF types anymore. So because of that, any type in split BTF that directly or
indirectly references that newly-to-be-resolved FWD type can't be considered
to be equivalent to the corresponding canonical types in base BTF, because
that would result in a loss of type resolution information. So in such case,
split BTF types will be deduplicated separately and will cause some
duplication of type information, which is unavoidable.

With those two changes, the rest of the algorithm manages to deduplicate split
BTF correctly, pointing all the duplicates to their canonical counter-parts in
base BTF, but also is deduplicating whatever unique types are present in split
BTF on their own.

Also, theoretically, split BTF after deduplication could end up with either
empty type section or empty string section. This is handled by libbpf
correctly in one of previous patches in the series.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <redacted>

With some nits:
quoted
---
[...]
quoted
     /* remap string offsets */
     err = btf_for_each_str_off(d, strs_dedup_remap_str_off, d);
@@ -3553,6 +3582,63 @@ static bool btf_compat_fnproto(struct btf_type *t1, struct btf_type *t2)
     return true;
}
An overview comment about bpf_deup_prep() will be great.
ok
quoted
quoted
+static int btf_dedup_prep(struct btf_dedup *d)
+{
+     struct btf_type *t;
+     int type_id;
+     long h;
+
+     if (!d->btf->base_btf)
+             return 0;
+
+     for (type_id = 1; type_id < d->btf->start_id; type_id++)
+     {
Move "{" to previous line?
yep, my bad
quoted
quoted
+             t = btf_type_by_id(d->btf, type_id);
+
+             /* all base BTF types are self-canonical by definition */
+             d->map[type_id] = type_id;
+
+             switch (btf_kind(t)) {
+             case BTF_KIND_VAR:
+             case BTF_KIND_DATASEC:
+                     /* VAR and DATASEC are never hash/deduplicated */
+                     continue;
[...]
quoted
     /* we are going to reuse hypot_map to store compaction remapping */
     d->hypot_map[0] = 0;
-     for (i = 1; i <= d->btf->nr_types; i++)
-             d->hypot_map[i] = BTF_UNPROCESSED_ID;
+     /* base BTF types are not renumbered */
+     for (id = 1; id < d->btf->start_id; id++)
+             d->hypot_map[id] = id;
+     for (i = 0, id = d->btf->start_id; i < d->btf->nr_types; i++, id++)
+             d->hypot_map[id] = BTF_UNPROCESSED_ID;
We don't really need i in the loop, shall we just do
       for (id = d->btf->start_id; id < d->btf->start_id + d->btf->nr_types; id++)
?
I prefer the loop with i iterating over the count of types, it seems
more "obviously correct". For simple loop like this I could do

for (i = 0; i < d->btf->nr_types; i++)
   d->hypot_map[d->start_id + i] = ...;

But for the more complicated one below I found that maintaining id as
part of the for loop control block is a bit cleaner. So I just stuck
to the consistent pattern across all of them.
How about 

	for (i = 0; i < d->btf->nr_types; i++) {
		id = d->start_id + i;
		...
?

I would expect for loop with two loop variable to do some tricks, like two 
termination conditions, or another conditional id++ somewhere in the loop. 

Thanks,
Song
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help