Thread (11 messages) 11 messages, 4 authors, 2016-08-26

Re: [PATCH] sched: fix the intention to re-evalute tick dependency for offline cpu

From: Frederic Weisbecker <hidden>
Date: 2016-08-10 18:53:09
Also in: lkml

On Wed, Aug 10, 2016 at 09:23:11PM +0800, Wanpeng Li wrote:
2016-08-10 20:43 GMT+08:00 Frederic Weisbecker [off-list ref]:
quoted
On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote:
quoted
From: Wanpeng Li <redacted>

The dl task will be replenished after dl task timer fire and start a new
period. It will be enqueued and to re-evaluate its dependency on the tick
in order to restart it. However, if cpu is hot-unplug, irq_work_queue will
splash since the target cpu is offline.

As a result:

    WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0
    Call Trace:
     dump_stack+0x99/0xd0
     __warn+0xd1/0xf0
     warn_slowpath_null+0x1d/0x20
     irq_work_queue_on+0xad/0xe0
     tick_nohz_full_kick_cpu+0x44/0x50
     tick_nohz_dep_set_cpu+0x74/0xb0
     enqueue_task_dl+0x226/0x480
     activate_task+0x5c/0xa0
     dl_task_timer+0x19b/0x2c0
     ? push_dl_task.part.31+0x190/0x190

This can be triggered by hot-unplug the full dynticks cpu which dl task
is running on.

Actually we don't need to restart the tick since the target cpu is offline
and nothing need scheduler tick. This patch fix it by not intend to re-evaluate
tick dependency if the cpu is offline.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <redacted>
Cc: Luca Abeni <redacted>
Signed-off-by: Wanpeng Li <redacted>
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..43b494f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq)
 {
      int fifo_nr_running;

+     if (unlikely(!rq->online))
+             return true;
+
I see, the CPU is offline but the tasks haven't been migrated yet.
That said it seems that rollback is still possible at this stage.

Somehow we may need to deal with it.
Thanks for your review, Frederic. :) The rq lock is held to serialize
concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick()
is called in this path), so I think there is no issue here.
It's not about concurrency though. It's rather that if the CPU runs
tickless, does cpu_down() and fails, then if the dl task needs the tick and
we ignore the IPI due to cpu_is_offline(), we may be still running tickless
forever after cpu_down() failure exit.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help