Re: [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols
From: Leizhen (ThunderTown) <hidden>
Date: 2022-10-26 06:45:03
Also in:
live-patching, lkml
On 2022/10/26 1:53, Luis Chamberlain wrote:
On Wed, Oct 19, 2022 at 10:11:58PM +0800, Leizhen (ThunderTown) wrote:quoted
On 2022/10/19 20:01, Luis Chamberlain wrote:quoted
On Mon, Oct 17, 2022 at 02:49:39PM +0800, Zhen Lei wrote:quoted
Currently, to search for a symbol, we need to expand the symbols in 'kallsyms_names' one by one, and then use the expanded string for comparison. This is very slow. In fact, we can first compress the name being looked up and then use it for comparison when traversing 'kallsyms_names'. This patch series optimizes the performance of function kallsyms_lookup_name(), and function klp_find_object_symbol() in the livepatch module. Based on the test results, the performance overhead is reduced to 5%. That is, the performance of these functions is improved by 20 times.Stupid question, is a hash table in order?No hash table. All symbols are arranged in ascending order of address. For example: cat /proc/kallsyms The addresses of all symbols are stored in kallsyms_addresses[], and names of all symbols are stored in kallsyms_names[]. The elements in these two arrays are in a one-to-one relationship. For any symbol, it has the same index in both arrays. Therefore, when we look up a symbolic name based on an address, we use a binary lookup. However, when we look up an address based on a symbol name, we can only traverse array kallsyms_names[] in sequence. I think the reason why hash is not used is to save memory.This answers how we don't use a hash table, the question was *should* we use one?
I'm not the original author, and I can only answer now based on my understanding. Maybe the original author didn't think of the hash method, or he has weighed it out. Hash is a good solution if only performance is required and memory overhead is not considered. Using hash will increase the memory size by up to "4 * kallsyms_num_syms + 4 * ARRAY_SIZE(hashtable)" bytes, kallsyms_num_syms is about 1-2 million. Because I don't know what hash algorithm will be used, the cost of generating the hash value corresponding to the symbol name is unknown now. But I think it's gonna be small. But it definitely needs a simpler algorithm, the tool needs to implement the same hash algorithm. If the hash is not very uniform or ARRAY_SIZE(hashtable) is small, then my current approach still makes sense. So maybe hash can be deferred to the next phase of improvement.
Luis .
-- Regards, Zhen Lei