Re: [bisected][mainline]Kernel warnings at kernel/sched/cpudeadline.c:219
From: Marek Szyprowski <m.szyprowski@samsung.com>
Date: 2025-10-09 11:54:38
Also in:
lkml
On 09.10.2025 10:00, Peter Zijlstra wrote:
quoted hunk ↗ jump to hunk
On Wed, Oct 08, 2025 at 11:39:11PM +0530, Shrikanth Hegde wrote:quoted
*It pointed to this* NIP [c0000000001fd798] dl_server_start+0x50/0xd8 LR [c0000000001d9534] enqueue_task_fair+0x228/0x8ec Call Trace: [c000006684a579c0] [0000000000000001] 0x1 (unreliable) [c000006684a579f0] [c0000000001d9534] enqueue_task_fair+0x228/0x8ec [c000006684a57a60] [c0000000001bb344] enqueue_task+0x5c/0x1c8 [c000006684a57aa0] [c0000000001c5fc0] ttwu_do_activate+0x98/0x2fc [c000006684a57af0] [c0000000001c671c] try_to_wake_up+0x2e0/0xa60 [c000006684a57b80] [c00000000019fb48] kthread_park+0x7c/0xf0 [c000006684a57bb0] [c00000000015fefc] takedown_cpu+0x60/0x194 [c000006684a57c00] [c000000000161924] cpuhp_invoke_callback+0x1f4/0x9a4 [c000006684a57c90] [c0000000001621a4] __cpuhp_invoke_callback_range+0xd0/0x188 [c000006684a57d30] [c000000000165aec] _cpu_down+0x19c/0x560 [c000006684a57df0] [c0000000001637c0] __cpu_down_maps_locked+0x2c/0x3c [c000006684a57e10] [c00000000018a100] work_for_cpu_fn+0x38/0x54 [c000006684a57e40] [c00000000019075c] process_one_work+0x1d8/0x554 [c000006684a57ef0] [c00000000019165c] worker_thread+0x308/0x46c [c000006684a57f90] [c00000000019e474] kthread+0x16c/0x19c [c000006684a57fe0] [c00000000000dd58] start_kernel_thread+0x14/0x18 It is takedown_cpu called from CPU0(boot CPU) and it wakes up kthread which is CPU Bound I guess. Since happens after rq was marked offline, it ends up starting the deadline server again. So i think it is sensible idea to stop the deadline server if the cpu is going down. Once we stop the server we will return HRTIMER_NORESTART.D'0h.. that stop was far too early. How about moving that dl_server_stop() into sched_cpu_dying() like so. This seems to survive a few hotplugs for me. ---diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 198d2dd45f59..f1ebf67b48e2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c@@ -8571,10 +8571,12 @@ int sched_cpu_dying(unsigned int cpu) sched_tick_stop(cpu); rq_lock_irqsave(rq, &rf); + update_rq_clock(rq); if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) { WARN(true, "Dying CPU not properly vacated!"); dump_rq_tasks(rq, KERN_WARNING); } + dl_server_stop(&rq->fair_server); rq_unlock_irqrestore(rq, &rf); calc_load_migrate(rq);diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 615411a0a881..7b7671060bf9 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c@@ -1582,6 +1582,9 @@ void dl_server_start(struct sched_dl_entity *dl_se) if (!dl_server(dl_se) || dl_se->dl_server_active) return; + if (WARN_ON_ONCE(!cpu_online(cpu_of(rq)))) + return; + dl_se->dl_server_active = 1; enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP); if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))
This fixes a similar issue observed on Samsung Exynos SoC based boards (ARM 32bit and 64bit) that I've reported in the following thread: https://lore.kernel.org/all/e56310b5-f7a9-4fad-b79a-dcbcdd3d3883@samsung.com/ (local) Thanks for the fix! Feel free to add: Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland