Re: [PATCH v2 1/8] scripts/kallsyms: don't compress symbol type when CONFIG_KALLSYMS_ALL=y
From: Petr Mladek <pmladek@suse.com>
Date: 2022-09-21 07:13:48
Also in:
live-patching, lkml
On Wed 2022-09-21 10:42:56, Leizhen (ThunderTown) wrote:
On 2022/9/21 1:26, Petr Mladek wrote:quoted
On Fri 2022-09-09 21:00:09, Zhen Lei wrote:quoted
Currently, to search for a symbol, we need to expand the symbols in 'kallsyms_names' one by one, and then use the expanded string for comparison. This is very slow. In fact, we can first compress the name being looked up and then use it for comparison when traversing 'kallsyms_names'.This does not explain how this patch modifies the compressed data and why it is needed.Yes, I have updated the description from the v3 version.
Ah, there is even v4. I have missed that. The commit message looks much better there.
So if we don't compress the symbol type, we can first compress the searched symbol and then make a quick comparison based on the compressed length and content. In this way, for entries with mismatched lengths, there is no need to expand and compare strings. And for those matching lengths, there's no need to expand the symbol. This saves a lot of time.quoted
quoted
This increases the size of 'kallsyms_names'. About 48KiB, 2.67%, on x86 with defconfig. Before: kallsyms_num_syms=131392, sizeof(kallsyms_names)=1823659 After : kallsyms_num_syms=131392, sizeof(kallsyms_names)=1872418 However, if CONFIG_KALLSYMS_ALL is not set, the size of 'kallsyms_names' does not change. Signed-off-by: Zhen Lei <redacted> --- scripts/kallsyms.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index f18e6dfc68c5839..ab6fe7cd014efd1 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c + */ + if (all_symbols) + sym_start_idx = 1;This looks a bit fragile. My understanding is that the new code in kernel/kallsyms.c and kernel/module/kallsyms.c depends on this change.They do not depend on this change, because the index in insert_real_symbols_in_table() is still starting from 0. kallsyms_expand_symbol() shows that it uses every byte of the compressed data to look up the token table. The index in insert_real_symbols_in_table() starting from 0 make sure that the raw character of 'type' occupies a separate position in kallsyms_token_table[]. So that kallsyms_expand_symbol() can still work well.
I guess that we are talking about different things. Anyway, please ignore my concern about that it is fragile. The change in scripts/kallsyms.c does not longer depend on --all-symbols parameter in the last v4 patchset.
quoted
I would personally suggest to store the symbol type into a separate sym->type entry in struct sym_entry and never compress it.Yes,I've also considered this, for the purpose of increasing the compression ratio. See below, if the sorting is performed based on the address and then based on the type. We can record all the symbol type information in less than 100 bytes. Of course, this makes the functions that look up symbols based on the address loop serveral times more. However, I would like to wait until the current patch series is accepted. Otherwise, I'll have to rework a lot of patches and it's too much work. To be honest, I've been coding for it these days. cat /proc/kallsyms | awk '{print $2}' | sort | uniq -c | sort -r 44678 r 38299 t 28315 T 11644 d 3768 D 2778 b 778 R 641 B 282 A 178 W 37 V
This is another optimization. I agree that we could do it later after this patchset is accepted. Best Regards, Petr