Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
From: Tejun Heo <hidden>
Date: 2016-06-16 19:35:10
Also in:
lkml
Subsystem:
scheduler, the rest · Maintainers:
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Linus Torvalds
Hello, So, the issue of the initial worker not having its affinity set correctly wasn't caused by the order of the operations. Reordering just made set_cpus_allowed tried one more time late enough so that it hides the race condition most of the time. The problem is that CPU_ONLINE callbacks are called while the cpu being onlined is online but not active and select_fallback_rq() only considers active cpus, so if a kthread gets scheduled in the meantime and it doesn't have any cpu which is active in its allowed mask, it's allowed mask gets reset to cpu_possible_mask. Would something like the following make sense? Thanks. ------ 8< ------ Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is online but not active. A CPU_ONLINE callback may create or bind a kthread so that its cpus_allowed mask only allows the CPU which is being brought online. The kthread may start executing before the CPU is made active and can end up in select_fallback_rq(). In such cases, the expected behavior is selecting the CPU which is coming online; however, because select_fallback_rq() only chooses from active CPUs, it determines that the task doesn't have any viable CPU in its allowed mask and ends up overriding it to cpu_possible_mask. CPU_ONLINE callbacks should be able to put kthreads on the CPU which is coming online. Update select_fallback_rq() so that it follows cpu_online() rather than cpu_active() for kthreads. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Gautham R Shenoy <redacted> --- kernel/sched/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 017d539..a12e3db 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c@@ -1536,7 +1536,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p) for (;;) { /* Any allowed, online CPU? */ for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) { - if (!cpu_active(dest_cpu)) + if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu)) + continue; + if (!cpu_online(dest_cpu)) continue; goto out; }