Re: [PATCH] kallsyms: Fix sleeping function called from invalid context when CONFIG_KALLSYMS_SELFTEST=y
From: Petr Mladek <pmladek@suse.com>
Date: 2022-12-20 15:43:17
Also in:
lkml
On Tue 2022-12-20 22:29:31, Leizhen (ThunderTown) wrote:
On 2022/12/20 19:01, Petr Mladek wrote:quoted
On Tue 2022-12-20 08:15:40, Christophe Leroy wrote:quoted
Le 20/12/2022 à 07:39, Zhen Lei a écrit :quoted
[T58] BUG: sleeping function called from invalid context at kernel/kallsyms.c:305 [T58] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 58, name: kallsyms_test [T58] preempt_count: 0, expected: 0 [T58] RCU nest depth: 0, expected: 0 [T58] no locks held by kallsyms_test/58. [T58] irq event stamp: 18899904 [T58] hardirqs last enabled at (18899903): finish_task_switch.isra.0 (core.c:?) [T58] hardirqs last disabled at (18899904): test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?) [T58] softirqs last enabled at (18899886): __do_softirq (??:?) [T58] softirqs last disabled at (18899879): ____do_softirq (irq.c:?) [T58] CPU: 0 PID: 58 Comm: kallsyms_test Tainted: G T 6.1.0-next-20221215 #2 [T58] Hardware name: linux,dummy-virt (DT) [T58] Call trace: [T58] dump_backtrace (??:?) [T58] show_stack (??:?) [T58] dump_stack_lvl (??:?) [T58] dump_stack (??:?) [T58] __might_resched (??:?) [T58] kallsyms_on_each_symbol (??:?) [T58] test_perf_kallsyms_on_each_symbol (kallsyms_selftest.c:?) [T58] test_entry (kallsyms_selftest.c:?) [T58] kthread (kthread.c:?) [T58] ret_from_fork (??:?) [T58] kallsyms_selftest: kallsyms_on_each_symbol() traverse all: 5744310840 ns [T58] kallsyms_selftest: kallsyms_on_each_match_symbol() traverse all: 1164580 ns [T58] kallsyms_selftest: finish Functions kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol() call the user-registered hook function for each symbol that meets the requirements. Because it is uncertain how long that hook function will execute, they call cond_resched() to avoid consuming CPU resources for a long time. However, irqs need to be disabled during the performance test to ensure the accuracy of test data. Because the performance test hook is very clear, very simple function, let's do not call cond_resched() when CONFIG_KALLSYMS_SELFTEST=y.I don't think it is appropriate to change the behaviour of a core function based on whether a compile time option related to tests is selected or not, because you will change the behaviour for all users, not only for the tests.I agree. This is very bad idea. It would change the behavior for the entire system.It just doesn't look so good, but it doesn't affect the entire system, and the proposed changes below will.
I do not understand this. The proposed patch did:
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c@@ -302,7 +302,8 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, ret = fn(data, namebuf, NULL, kallsyms_sym_address(i)); if (ret != 0) return ret; - cond_resched(); + if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST)) + cond_resched(); } return 0; }
@@ -319,7 +320,8 @@ int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long), for (i = start; !ret && i <= end; i++) { ret = fn(data, kallsyms_sym_address(get_symbol_seq(i))); - cond_resched(); + if (!IS_ENABLED(CONFIG_KALLSYMS_SELFTEST)) + cond_resched(); } return ret;
It changes the behavior for any kallsyms_on_each_symbol()
and kallsyms_on_each_match_symbol() caller. And this _is_ the problem!
The cond_resched() is there for a reason. See the commit
f5bdb34bf0c9314548f2 ("livepatch: Avoid CPU hogging with
cond_resched").
If a function explicitely includes cond_resched() because it
delayed something by >1s that it must never be called with
interrupts enabled except when there is a good reason to
do so, e.g. for debugging. And measuring CPU time
in a selftest is not a good reason, definitely!
quoted
If I get it correctly, the problem is this code in kernel/kallsyms_selftest.c:Yes, another method is to remove the interrupt protection.quoted
static int lookup_name(void *data, const char *name, struct module *mod, unsigned long addr) { [...] local_irq_save(flags); t0 = sched_clock(); (void)kallsyms_lookup_name(name); t1 = sched_clock(); local_irq_restore(flags); [...] and IRQs are disabled to measure the time spent in this function without interruption and rescheduling. I am sure that there are better ways how to measure the time. Even the "time" command in userspace is able to show time how much CPU time a command used.I've got an idea: local_irq_save(flags); //get the count and cputime of interrupts t0 = sched_clock(); local_irq_restore(flags); (void)kallsyms_lookup_name(name); local_irq_save(flags); t1 = sched_clock(); //get the count and cputime of interrupts local_irq_restore(flags);
Why do you need to disable IRQs around sched_clock(), please? It works well in any context. Is sched_clock() needed at all? Honestly, I do not understand why you are using sched_clock() for this.
quoted
I am not familiar with it. But task_cputime() in kernel/sched/cputime.c looks promising. And there must be the interface how the user space get this information. Some is available via /proc/<PID>/... I am not sure if there is a syscall.
Please, try to investigate the above hints. Honestly, it looks like you do not know what you are doing. I NACK this patch. Best Regards, Petr