Thread (20 messages) 20 messages, 3 authors, 2022-09-21

Re: [PATCH v2 1/8] scripts/kallsyms: don't compress symbol type when CONFIG_KALLSYMS_ALL=y

From: Petr Mladek <pmladek@suse.com>
Date: 2022-09-21 07:13:48
Also in: live-patching, lkml

On Wed 2022-09-21 10:42:56, Leizhen (ThunderTown) wrote:

On 2022/9/21 1:26, Petr Mladek wrote:
quoted
On Fri 2022-09-09 21:00:09, Zhen Lei wrote:
quoted
Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. This is very slow.

In fact, we can first compress the name being looked up and then use
it for comparison when traversing 'kallsyms_names'.
This does not explain how this patch modifies the compressed data
and why it is needed.
Yes, I have updated the description from the v3 version.
Ah, there is even v4. I have missed that. The commit message looks
much better there.
So if we don't compress the symbol type, we can first compress the
searched symbol and then make a quick comparison based on the compressed
length and content. In this way, for entries with mismatched lengths,
there is no need to expand and compare strings. And for those matching
lengths, there's no need to expand the symbol. This saves a lot of time.
quoted
quoted
This increases the size of 'kallsyms_names'. About 48KiB, 2.67%, on x86
with defconfig.
Before: kallsyms_num_syms=131392, sizeof(kallsyms_names)=1823659
After : kallsyms_num_syms=131392, sizeof(kallsyms_names)=1872418

However, if CONFIG_KALLSYMS_ALL is not set, the size of 'kallsyms_names'
does not change.

Signed-off-by: Zhen Lei <redacted>
---
 scripts/kallsyms.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index f18e6dfc68c5839..ab6fe7cd014efd1 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
+	 */
+	if (all_symbols)
+		sym_start_idx = 1;
This looks a bit fragile. My understanding is that the new code in
kernel/kallsyms.c and kernel/module/kallsyms.c depends on this change.
They do not depend on this change, because the index in
insert_real_symbols_in_table() is still starting from 0. kallsyms_expand_symbol()
shows that it uses every byte of the compressed data to look up the token table.
The index in insert_real_symbols_in_table() starting from 0 make sure that the
raw character of 'type' occupies a separate position in kallsyms_token_table[].
So that kallsyms_expand_symbol() can still work well.
I guess that we are talking about different things. Anyway, please
ignore my concern about that it is fragile. The change in
scripts/kallsyms.c does not longer depend on --all-symbols parameter
in the last v4 patchset.
quoted
I would personally suggest to store the symbol type into a separate
sym->type entry in struct sym_entry and never compress it.
Yes,I've also considered this, for the purpose of increasing the
compression ratio. See below, if the sorting is performed based on
the address and then based on the type. We can record all the symbol
type information in less than 100 bytes. Of course, this makes the
functions that look up symbols based on the address loop serveral
times more. However, I would like to wait until the current patch
series is accepted. Otherwise, I'll have to rework a lot of patches
and it's too much work. To be honest, I've been coding for it these days.

cat /proc/kallsyms | awk '{print $2}' | sort | uniq -c | sort -r
  44678 r
  38299 t
  28315 T
  11644 d
   3768 D
   2778 b
    778 R
    641 B
    282 A
    178 W
     37 V
This is another optimization. I agree that we could do it later after
this patchset is accepted.

Best Regards,
Petr
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help