Thread (13 messages) 13 messages, 5 authors, 2021-03-17

Re: [PATCH v2 3/4] kbuild: re-implement CONFIG_TRIM_UNUSED_KSYMS to make it work in one-pass

From: Nicolas Pitre <nico@fluxnic.net>
Date: 2021-03-09 19:55:41
Also in: linux-kbuild, lkml

On Wed, 10 Mar 2021, Masahiro Yamada wrote:
On Wed, Mar 10, 2021 at 2:36 AM Nicolas Pitre [off-list ref] wrote:
quoted
On Wed, 10 Mar 2021, Masahiro Yamada wrote:
quoted
Commit a555bdd0c58c ("Kbuild: enable TRIM_UNUSED_KSYMS again, with some
guarding") re-enabled this feature, but Linus is still unhappy about
the build time.

The reason of the slowness is the recursion - this basically works in
two loops.

In the first loop, Kbuild builds the entire tree based on the temporary
autoksyms.h, which contains macro defines to control whether their
corresponding EXPORT_SYMBOL() is enabled or not, and also gathers all
symbols required by modules. After the tree traverse, Kbuild updates
autoksyms.h and triggers the second loop to rebuild source files whose
EXPORT_SYMBOL() needs flipping.

This commit re-implements CONFIG_TRIM_UNUSED_KSYMS to make it work in
one pass. In the new design, unneeded EXPORT_SYMBOL() instances are
trimmed by the linker instead of the preprocessor.

After the tree traverse, a linker script snippet <generated/keep-ksyms.h>
is generated. It feeds the list of necessary sections to vmlinus.lds.S
and modules.lds.S. The other sections fall into /DISCARD/.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
I'm not sure I do understand every detail here, especially since it is
so far away from the version that I originally contributed. But the
concept looks good.

I still think that there is no way around a recursive approach to get
the maximum effect with LTO, but given that true LTO still isn't applied
to mainline after all those years, the recursive approach brings
nothing. Maybe that could be revisited if true LTO ever makes it into
mainline, and the desire to reduce the binary size is still relevant
enough to justify it.
Hmm, I am confused.

Does this patch change the behavior in the
combination with the "true LTO"?

Please let me borrow this sentence from your article:
"But what LTO does is more like getting rid of branches that simply
float in the air without being connected to anything or which have
become loose due to optimization."
(https://lwn.net/Articles/746780/)

This patch throws unneeded EXPORT_SYMBOL metadata
into the /DISCARD/ section of the linker script.

The approach is different (preprocessor vs linker), but
we will still get the same result; the unneeded
EXPORT_SYMBOLs are disconnected from the main trunk.

Then, the true LTO will remove branches floating in the air,
right?

So, what will be lost by this patch?
Let's say you have this in module_foo:

int foo(int x)
{
	return 2 + bar(x);
}
EXPORT_SYMBOL(foo);

And module_bar:

int bar(int y)
{
	return 3 * baz(y);
}
EXPORT_SYMBOL(bar);

And this in the main kernel image:

int baz(int z)
{
	return plonk(z);
}
EXPORT_SYMBOLbaz);

Now we build the kernel and modules. Then we realize that nothing 
references symbol "foo". We can trim the "foo" export. But it would be 
necessary to recompile module_foo with LTO (especially if there is 
some other code in that module) to realize that nothing 
references foo() any longer and optimize away the reference to bar(). 
With another round, we now realize that the "bar" export is no longer 
necessary. But that will require another compile round to optimize away 
the reference to baz(). And then a final compilation round with 
LTO to possibly optimize plonk() out of the kernel.

I don't see how you can propagate all this chain reaction with only one 
pass.


Nicolas
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help