Thread (30 messages) 30 messages, 6 authors, 2016-07-15

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

From: Tejun Heo <hidden>
Date: 2016-06-16 19:35:10
Also in: lkml
Subsystem: scheduler, the rest · Maintainers: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Linus Torvalds

Hello,

So, the issue of the initial worker not having its affinity set
correctly wasn't caused by the order of the operations.  Reordering
just made set_cpus_allowed tried one more time late enough so that it
hides the race condition most of the time.  The problem is that
CPU_ONLINE callbacks are called while the cpu being onlined is online
but not active and select_fallback_rq() only considers active cpus, so
if a kthread gets scheduled in the meantime and it doesn't have any
cpu which is active in its allowed mask, it's allowed mask gets reset
to cpu_possible_mask.

Would something like the following make sense?

Thanks.
------ 8< ------
Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus

During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
online but not active.  A CPU_ONLINE callback may create or bind a
kthread so that its cpus_allowed mask only allows the CPU which is
being brought online.  The kthread may start executing before the CPU
is made active and can end up in select_fallback_rq().

In such cases, the expected behavior is selecting the CPU which is
coming online; however, because select_fallback_rq() only chooses from
active CPUs, it determines that the task doesn't have any viable CPU
in its allowed mask and ends up overriding it to cpu_possible_mask.

CPU_ONLINE callbacks should be able to put kthreads on the CPU which
is coming online.  Update select_fallback_rq() so that it follows
cpu_online() rather than cpu_active() for kthreads.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Gautham R Shenoy <redacted>
---
 kernel/sched/core.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 017d539..a12e3db 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1536,7 +1536,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 	for (;;) {
 		/* Any allowed, online CPU? */
 		for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) {
-			if (!cpu_active(dest_cpu))
+			if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu))
+				continue;
+			if (!cpu_online(dest_cpu))
 				continue;
 			goto out;
 		}
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help