Re: [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols
From: Leizhen (ThunderTown) <hidden>
Date: 2022-10-31 15:04:51
Also in:
live-patching, lkml
On 2022/10/31 12:55, Leizhen (ThunderTown) wrote:
On 2022/10/29 16:10, Leizhen (ThunderTown) wrote:quoted
On 2022/10/27 14:27, Leizhen (ThunderTown) wrote:quoted
On 2022/10/27 11:26, Leizhen (ThunderTown) wrote:quoted
On 2022/10/27 3:03, Luis Chamberlain wrote:quoted
On Wed, Oct 26, 2022 at 02:44:36PM +0800, Leizhen (ThunderTown) wrote:quoted
On 2022/10/26 1:53, Luis Chamberlain wrote:quoted
This answers how we don't use a hash table, the question was *should* we use one?I'm not the original author, and I can only answer now based on my understanding. Maybe the original author didn't think of the hash method, or he has weighed it out. Hash is a good solution if only performance is required and memory overhead is not considered. Using hash will increase the memory size by up to "4 * kallsyms_num_syms + 4 * ARRAY_SIZE(hashtable)" bytes, kallsyms_num_syms is about 1-2 million.Sorry, 1-2 million ==> 0.1~0.2 millionquoted
quoted
quoted
Because I don't know what hash algorithm will be used, the cost of generating the hash value corresponding to the symbol name is unknown now. But I think it's gonna be small. But it definitely needs a simpler algorithm, the tool needs to implement the same hash algorithm.For instance, you can look at evaluating if alloc_large_system_hash() would help.The following three hash algorithms are compared. The kernel is compiled by defconfig on arm64. |---------------------------------------------------------------------------------------| | | hash &= HASH_TABLE_SIZE - 1 | | | number of conflicts >= 1000 | |---------------------------------------------------------------------------------------| | ARRAY_SIZE(hash_table) | crc16 | jhash_one_at_a_time | string hash_32 | |---------------------------------------------------------------------------------------| | | 345b: 3905 | 0d40: 1013 | 4a4c: 6548 | | | 35bb: 1016 | 35ce: 6549 | 883a: 1015 | | 0x10000 | 385b: 6548 | 4440: 19126 | d05f: 19129 | | | f0ba: 19127 | 7ebe: 3916 | ecda: 3903 | |---------------------------------------------------------------------------------------| | | 0ba: 19168 | 440: 19165 | 05f: 19170 | | | 45b: 3955 | 5ce: 6577 | 83a: 1066 | | 0x1000 | 5bb: 1069 | d40: 1052 | a4c: 6609 | | | 85b: 6582 | ebe: 3938 | cda: 3924 | |---------------------------------------------------------------------------------------| Based on the above test results, I conclude that: 1. For the worst-case scenario, the three algorithms are not much different. But the kernel only implements crc16 and string hash_32. The latter is not processed byte-by-byte, so it is coupled with byte order and sizeof(long). So crc16 is the best choice. 2. For the worst-case scenario, there are almost 19K strings are mapped to the same hash value,just over 1/10 of the total. And with my current compression-then-comparison approach, it's 25-30 times faster. So there's still a need for my current approach, and they can be combined. if (nr_conflicts(key) >= CONST_N) { newname = compress(name); for_each_name_in_slot(key): compare(new_name) } else { for_each_name_in_slot(key): compare(name) } Above CONST_N can be roughly calculated: time_of_compress(name) + N * time_of_compare(new_name) <= N * time_of_compare(name) 3. For the worst-case scenario, there is no obvious difference between ARRAY_SIZE(hash_table) 0x10000 and 0x1000. So ARRAY_SIZE(hash_table)=0x1000 is enough. Statistic information: |------------------------------------------------------| | nr_conflicts(key) | ARRAY_SIZE(hash_table) | |------------------------------------------------------| | <= ? | 0x1000 | 0x10000 | |------------------------------------------------------| | 0 | 0 | 7821 | | 20 | 19 | 57375 | | 40 | 2419 | 124 | | 60 | 1343 | 70 | | 80 | 149 | 73 | | 100 | 38 | 49 | | 200 | 108 | 16 | | 400 | 14 | 2 | | 600 | 2 | 2 | | 800 | 0 | 0 | | 1000 | 0 | 0 | | 100000 | 4 | 4 | |------------------------------------------------------| Also, I re-calculated: Using hash will increase the memory size by up to "6 * kallsyms_num_syms + 4 * ARRAY_SIZE(hashtable)" |---- What I said earlier was 4 The increased size is close to 1 MB if CONFIG_KALLSYMS_ALL=y. Hi, Luis: For the reasons of the above-mentioned second conclusion. And except for patches 4-6, even if only the hash method is used, other patches and option "--lto-clang" in patch 6/11 are also needed. If you don't mind, I'd like to use hash at the next stage. The current patch set is already huge.I just had an update in response to David Laight's email. The hash solution is like a centrist. It doesn't seem very feasible. Now, we need to make a decision. Choose one of the two: 1. Continue with my current approach. Improve the average performance of kallsyms_lookup_name() by 20 to 30 times. The memory overhead is increased by: arm64 (defconfig): 73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y. 19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n. x86 (defconfig): 49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y. 16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n. 2. Sort names, binary search (The static function causes duplicate names. Additional work is required) 2^18=262144, only up to 18 symbol expansions and comparisons are required. The performance is definitely excellent, although I haven't tested it yet. The memory overhead is increased by: 6 * kallsyms_num_syms arm64 (defconfig): 1MiB if CONFIG_KALLSYMS_ALL=y. 362KiB if CONFIG_KALLSYMS_ALL=n. x86 (defconfig): 770KiB if CONFIG_KALLSYMS_ALL=y. 356KiB if CONFIG_KALLSYMS_ALL=n.
Preliminary Test Results: (On Qemu arm64) [ 73.049249] kallsyms_selftest: kallsyms_lookup_name() looked up 151880 symbols [ 73.049331] kallsyms_selftest: The time spent on each symbol is (ns): min=1088, max=46848, avg=6629
quoted
quoted
quoted
OK, I found the right hash function. In this way, the tool does not need to consider the byte order.https://en.wikipedia.org/wiki/Jenkins_hash_function Let's go with jenkins_one_at_a_time_hash(), which looks simpler and doesn't even have to think about sizeof(long). It seems to be closest to our current needs. uint32_t jenkins_one_at_a_time_hash(const uint8_t* key, size_t length) { size_t i = 0; uint32_t hash = 0; while (i != length) { hash += key[i++]; hash += hash << 10; hash ^= hash >> 6; } hash += hash << 3; hash ^= hash >> 11; hash += hash << 15; return hash; }quoted
include/linux/stringhash.h /* * Version 1: one byte at a time. Example of use: * * unsigned long hash = init_name_hash; * while (*p) * hash = partial_name_hash(tolower(*p++), hash); * hash = end_name_hash(hash);quoted
Luis .
-- Regards, Zhen Lei