Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu
From: Frederic Weisbecker <hidden>
Date: 2016-08-10 18:53:09
Also in:
lkml
On Wed, Aug 10, 2016 at 09:23:11PM +0800, Wanpeng Li wrote:
2016-08-10 20:43 GMT+08:00 Frederic Weisbecker [off-list ref]:quoted
On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:quoted
From: Wanpeng Li <redacted> The dl task will be replenished after dl task timer fire and start a new period. It will be enqueued and to re-evaluate its dependency on the tick in order to restart it. However, if cpu is hot-unplug, irq_work_queue will splash since the target cpu is offline. As a result: WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 irq_work_queue_on+0xad/0xe0 tick_nohz_full_kick_cpu+0x44/0x50 tick_nohz_dep_set_cpu+0x74/0xb0 enqueue_task_dl+0x226/0x480 activate_task+0x5c/0xa0 dl_task_timer+0x19b/0x2c0 ? push_dl_task.part.31+0x190/0x190 This can be triggered by hot-unplug the full dynticks cpu which dl task is running on. Actually we don't need to restart the tick since the target cpu is offline and nothing need scheduler tick. This patch fix it by not intend to re-evaluate tick dependency if the cpu is offline. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <redacted> Cc: Luca Abeni <redacted> Signed-off-by: Wanpeng Li <redacted> --- kernel/sched/core.c | 3 +++ 1 file changed, 3 insertions(+)diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7f2cae4..43b494f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c@@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq) { int fifo_nr_running; + if (unlikely(!rq->online)) + return true; +I see, the CPU is offline but the tasks haven't been migrated yet. That said it seems that rollback is still possible at this stage. Somehow we may need to deal with it.Thanks for your review, Frederic. :) The rq lock is held to serialize concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick() is called in this path), so I think there is no issue here.
It's not about concurrency though. It's rather that if the CPU runs tickless, does cpu_down() and fails, then if the dl task needs the tick and we ignore the IPI due to cpu_is_offline(), we may be still running tickless forever after cpu_down() failure exit.