Thread (22 messages) 22 messages, 7 authors, 2022-03-03

Re: BTF compatibility issue across builds

From: Michal Suchánek <hidden>
Date: 2022-02-14 20:46:43
Also in: bpf

On Sun, Feb 13, 2022 at 04:40:44PM +0100, Toke Høiland-Jørgensen wrote:
Shung-Hsi Yu [off-list ref] writes:
quoted
On Sat, Feb 12, 2022 at 12:58:51AM +0100, Toke Høiland-Jørgensen wrote:
quoted
Andrii Nakryiko [off-list ref] writes:
quoted
On Fri, Feb 11, 2022 at 9:20 AM Toke Høiland-Jørgensen [off-list ref] wrote:
quoted
Andrii Nakryiko [off-list ref] writes:
quoted
On Thu, Feb 10, 2022 at 2:01 AM Michal Suchánek [off-list ref] wrote:
quoted
Hello,

On Mon, Jan 31, 2022 at 09:36:44AM -0800, Yonghong Song wrote:
quoted

On 1/27/22 7:10 AM, Shung-Hsi Yu wrote:
quoted
Hi,

We recently run into module load failure related to split BTF on openSUSE
Tumbleweed[1], which I believe is something that may also happen on other
rolling distros.

The error looks like the follow (though failure is not limited to ipheth)

     BPF:[103111] STRUCT BPF:size=152 vlen=2 BPF: BPF:Invalid name BPF:

     failed to validate module [ipheth] BTF: -22

The error comes down to trying to load BTF of *kernel modules from a
different build* than the runtime kernel (but the source is the same), where
the base BTF of the two build is different.

While it may be too far stretched to call this a bug, solving this might
make BTF adoption easier. I'd natively think that we could further split
base BTF into two part to avoid this issue, where .BTF only contain exported
types, and the other (still residing in vmlinux) holds the unexported types.
What is the exported types? The types used by export symbols?
This for sure will increase btf handling complexity.
And it will not actually help.

We have modversion ABI which checks the checksum of the symbols that the
module imports and fails the load if the checksum for these symbols does
not match. It's not concerned with symbols not exported, it's not
concerned with symbols not used by the module. This is something that is
sustainable across kernel rebuilds with minor fixes/features and what
distributions watch for.

Now with BTF the situation is vastly different. There are at least three
bugs:

 - The BTF check is global for all symbols, not for the symbols the
   module uses. This is not sustainable. Given the BTF is supposed to
   allow linking BPF programs that were built in completely different
   environment with the kernel it is completely within the scope of BTF
   to solve this problem, it's just neglected.
You refer to BTF use in CO-RE with the latter. It's just one
application of BTF and it doesn't follow that you can do the same with
module BTF. It's not a neglect, it's a very big technical difficulty.

Each module's BTFs are designed as logical extensions of vmlinux BTF.
And each module BTF is independent and isolated from other modules
extension of the same vmlinux BTF. The way that BTF format is
designed, any tiny difference in vmlinux BTF effectively invalidates
all modules' BTFs and they have to be rebuilt.

Imagine that only one BTF type is added to vmlinux BTF. Last BTF type
ID in vmlinux BTF is shifted from, say, 1000 to 1001. While previously
every module's BTF type ID started with 1001, now they all have to
start with 1002 and be shifted by 1.

Now let's say that the order of two BTF types in vmlinux BTF is
changed, say type 10 becomes type 20 and type 20 becomes type 10 (just
because of slight difference in DWARF, for instance). Any type
reference to 10 or 20 in any module BTF has to be renumbered now.

Another one, let's say we add a new string to vmlinux BTF string
section somewhere at the beginning, say "abc" at offset 100. Any
string offset after 100 now has to be shifted *both* in vmlinux BTF
and all module BTFs. And also any string reference in module BTFs have
to be adjusted as well because now each module's BTF's logical string
offset is starting at 4 logical bytes higher (due to "abc\0" being
added and shifting everything right).

As you can see, any tiny change in vmlinux BTF, no matter where,
beginning, middle, or end, causes massive changes in type IDs and
offsets everywhere. It's impractical to do any local adjustments, it's
much simpler and more reliable to completely regenerate BTF
completely.
This seems incredibly brittle, though? IIUC this means that if you want
BTF in your modules you *must* have not only the kernel headers of the
kernel it's going to run on, but the full BTF information for the exact
From BTF perspective, only vmlinux BTF. Having exact kernel headers
would minimize type information duplication.
Right, I meant you'd need the kernel headers to compile the module, and
the vmlinux BTF to build the module BTF info.
quoted
quoted
kernel image you're going to load that module on? How is that supposed
to work for any kind of environment where everything is not built
together? Third-party modules for distribution kernels is the obvious
example that comes to mind here, but as this thread shows, they don't
necessarily even have to be third party...

How would you go about "completely regenerating BTF" in practice for a
third-party module, say?
Great questions. I was kind of hoping you'll have some suggestions as
well, though. Not just complaints.
Well, I kinda took your "not really a bug either" comment to mean you
weren't really open to changing the current behaviour. But if that was a
misunderstanding on my part, I do have one thought:

The "partial BTF" thing in the modules is done to save space, right?
I.e., in principle there would be nothing preventing a module from
including a full (self-contained) set of BTF in its .ko when it is
compiled? Because if so, we could allow that as an optional mode that
can be enabled if you don't mind taking the size hit (any idea how large
that usually is, BTW?).
This seems quite nice IMO as no change need to be made on the generation
side of existing BTF tooling. I test it out on openSUSE Tumbleweed 5.16.5
kernel modules, and for the sake of completeness, includes both the case
where BTF is stripped and using a pre-trained zstd dictionary as well.

Uncompressed, no BTF                             362MiB -27%
Uncompressed, parital BTF                        499MiB +0%
Uncompressed, self-contained BTF                1026MiB +105%

Zstd compressed, no BTF                           95MiB -35%
Zstd compressed, partial BTF                     147MiB +0%
Zstd compressed, self-contained BTF              361MiB +145%
Zstd compressed (trained), self-contained BTF    299MiB +103%

So we'd expect quite a bit of hit as the size of kernel module would double.

For servers and workstation environment an additional ~200MiB of disk space
seems like tolerable trade-off if it can get third-party kernel module to
work. But I cannot speak for other kind of use cases.
Well, there are also in-between tradeoffs (i.e., you can build a subset
of the modules with self-contained BTF and a subset with partial BTF
depending on what fits your build environment).
As for that you would typically want in-tree modules with partial BTF.
It's a bug if they don't match, and if you can ignore the non-matching
BTF you should bee able to boot a system that is functional enough to
re-install the kernel. Today nothing critical depends on CO-RE.

On the othere hand if you build something out-of-tree be it virtualbox
or some module updated with cutting edge experimental changes you will
likely want full BTF.

Thanks

Michal
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help