Re: process hangs on do_exit when oom happens
From: Mike Galbraith <hidden>
Date: 2012-10-26 20:04:10
Also in:
linux-mm, linux-mmc, lkml
Subsystem:
scheduler, the rest · Maintainers:
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Linus Torvalds
On Fri, 2012-10-26 at 10:03 -0700, Mike Galbraith wrote:
The bug is in the patch that used sched_setscheduler_nocheck(). Plain sched_setscheduler() would have replied -EGOAWAY.
sched_setscheduler_nocheck() should say go away too methinks. This
isn't about permissions, it's about not being stupid in general.
sched: fix __sched_setscheduler() RT_GROUP_SCHED conditionals
Remove user and rt_bandwidth_enabled() RT_GROUP_SCHED conditionals in
__sched_setscheduler(). The end result of kernel OR user promoting a
task in a group with zero rt_runtime allocated is the same bad thing,
and throttle switch position matters little. It's safer to just say
no solely based upon bandwidth existence, may save the user a nasty
surprise if he later flips the throttle switch to 'on'.
The commit below came about due to sched_setscheduler_nocheck()
allowing a task in a task group with zero rt_runtime allocated to
be promoted by the kernel oom logic, thus marooning it forever.
<quote>
commit 341aea2bc48bf652777fb015cc2b3dfa9a451817
Author: KOSAKI Motohiro [off-list ref]
Date: Thu Apr 14 15:22:13 2011 -0700
oom-kill: remove boost_dying_task_prio()
This is an almost-revert of commit 93b43fa ("oom: give the dying task a
higher priority").
That commit dramatically improved oom killer logic when a fork-bomb
occurs. But I've found that it has nasty corner case. Now cpu cgroup has
strange default RT runtime. It's 0! That said, if a process under cpu
cgroup promote RT scheduling class, the process never run at all.
</quote>
Signed-off-by: Mike Galbraith <redacted>
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..d3a35f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c@@ -3810,17 +3810,14 @@ recheck: } #ifdef CONFIG_RT_GROUP_SCHED - if (user) { - /* - * Do not allow realtime tasks into groups that have no runtime - * assigned. - */ - if (rt_bandwidth_enabled() && rt_policy(policy) && - task_group(p)->rt_bandwidth.rt_runtime == 0 && - !task_group_is_autogroup(task_group(p))) { - task_rq_unlock(rq, p, &flags); - return -EPERM; - } + /* + * Do not allow realtime tasks into groups that have no runtime + * assigned. + */ + if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0 && + !task_group_is_autogroup(task_group(p))) { + task_rq_unlock(rq, p, &flags); + return -EPERM; } #endif