Thread (28 messages) 28 messages, 4 authors, 2022-11-02

Re: [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols

From: Leizhen (ThunderTown) <hidden>
Date: 2022-10-26 06:45:03
Also in: live-patching, lkml


On 2022/10/26 1:53, Luis Chamberlain wrote:
On Wed, Oct 19, 2022 at 10:11:58PM +0800, Leizhen (ThunderTown) wrote:
quoted

On 2022/10/19 20:01, Luis Chamberlain wrote:
quoted
On Mon, Oct 17, 2022 at 02:49:39PM +0800, Zhen Lei wrote:
quoted
Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. This is very slow.

In fact, we can first compress the name being looked up and then use
it for comparison when traversing 'kallsyms_names'.

This patch series optimizes the performance of function kallsyms_lookup_name(),
and function klp_find_object_symbol() in the livepatch module. Based on the
test results, the performance overhead is reduced to 5%. That is, the
performance of these functions is improved by 20 times.
Stupid question, is a hash table in order?
No hash table.

All symbols are arranged in ascending order of address. For example: cat /proc/kallsyms

The addresses of all symbols are stored in kallsyms_addresses[], and names of all symbols
are stored in kallsyms_names[]. The elements in these two arrays are in a one-to-one
relationship. For any symbol, it has the same index in both arrays.

Therefore, when we look up a symbolic name based on an address, we use a binary lookup.
However, when we look up an address based on a symbol name, we can only traverse array
kallsyms_names[] in sequence. I think the reason why hash is not used is to save memory.
This answers how we don't use a hash table, the question was *should* we
use one?
I'm not the original author, and I can only answer now based on my understanding. Maybe
the original author didn't think of the hash method, or he has weighed it out.

Hash is a good solution if only performance is required and memory overhead is not
considered. Using hash will increase the memory size by up to "4 * kallsyms_num_syms +
4 * ARRAY_SIZE(hashtable)" bytes, kallsyms_num_syms is about 1-2 million.

Because I don't know what hash algorithm will be used, the cost of generating the
hash value corresponding to the symbol name is unknown now. But I think it's gonna
be small. But it definitely needs a simpler algorithm, the tool needs to implement
the same hash algorithm.

If the hash is not very uniform or ARRAY_SIZE(hashtable) is small, then my current
approach still makes sense. So maybe hash can be deferred to the next phase of
improvement.
  Luis
.
-- 
Regards,
  Zhen Lei
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help