Re: bug in memcg oom-killer results in a hung syscall in another process in the same cgroup
From: Oleg Nesterov <oleg@redhat.com>
Date: 2016-07-14 13:23:05
Also in:
linux-mm, lkml
On 07/12, Shayan Pooya wrote:
quoted
Yep. Bug still not fixed in upstream. In our kernel I've plugged it with this:--- a/kernel/sched/core.c +++ b/kernel/sched/core.c@@ -2808,8 +2808,9 @@ asmlinkage __visible void schedule_tail(structtask_struct *prev) balance_callback(rq); preempt_enable(); - if (current->set_child_tid) - put_user(task_pid_vnr(current), current->set_child_tid); + if (current->set_child_tid && + put_user(task_pid_vnr(current), current->set_child_tid)) + force_sig(SIGSEGV, current); }I just verified that with your patch there is no hung processes and I see processes getting SIGSEGV as expected.
Well, but we can't do this. And "as expected" is actually just wrong. I still think that the whole FAULT_FLAG_USER logic is not right. This needs another email. fork() should not fail because there is a memory hog in the same memcg. Worse, pthread_create() can kill the caller by the same reason. And we have the same or even worse problem with ->clear_child_tid, pthread_join() can hang forever. Unlikely we want to kill the application in this case ;) And in fact I think that the problem has nothing to do with set/claer_child_tid in particular. I am just curious... can you reproduce the problem reliably? If yes, can you try the patch below ? Just in case, this is not the real fix in any case... Oleg.
--- x/kernel/sched/core.c
+++ x/kernel/sched/core.c@@ -2793,8 +2793,11 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) balance_callback(rq); preempt_enable(); - if (current->set_child_tid) + if (current->set_child_tid) { + mem_cgroup_oom_enable(); put_user(task_pid_vnr(current), current->set_child_tid); + mem_cgroup_oom_disable(); + } } /*