Re: [PATCH] mm,oom: use per signal_struct flag rather than clear TIF_MEMDIE
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: 2016-06-25 05:44:48
Oleg Nesterov wrote:
Since I mentioned TIF_MEMDIE in another thread, I simply can't resist. Sorry for grunting. On 06/24, Tetsuo Handa wrote:quoted
--- a/include/linux/sched.h +++ b/include/linux/sched.h@@ -801,6 +801,7 @@ struct signal_struct { * oom */ bool oom_flag_origin; + bool oom_ignore_victims; /* Ignore oom_victims value */ short oom_score_adj; /* OOM kill score adjustment */ short oom_score_adj_min; /* OOM kill score adjustment min value. * Only settable by CAP_SYS_RESOURCE. */Yet another kludge to fix yet another problem with TIF_MEMDIE. Not to mention that that wh Can't we state the fact TIF_MEMDIE is just broken? The very idea imo.
Yes. TIF_MEMDIE is a trouble maker. Setting TIF_MEMDIE is per task_struct operation. Sending SIGKILL is per signal_struct operation. OOM killer is per mm_struct operation.
I am starting to seriously think we should kill this flag, fix the compilation errors, remove the dead code (including the oom_victims logic), and then try to add something else. Say, even MMF_MEMDIE looks better although I understand it is not that simple.
I wish that TIF_MEMDIE is per signal_struct flag. But since we allow mm-less TIF_MEMDIE thread to use ALLOC_NO_WATERMARKS via TIF_MEMDIE inside __mmput() from mmput() from exit_mm() from do_exit(), we can't replace test_thread_flag(TIF_MEMDIE) in gfp_to_alloc_flags() with current->signal->oom_killed or current->mm && (current->mm->flags & MMF_MEMDIE) . But
Just one question. Why do we need this bit outside of oom-kill.c? It affects page_alloc.c and this probably makes sense. But who get this flag when we decide to kill the memory hog? A random thread foung by find_lock_task_mm(), iow a random thread with ->mm != NULL, likely the group leader. This simply can not be right no matter what.
I agree that setting TIF_MEMDIE to only first ->mm != NULL thread does not make sense. I've proposed setting TIF_MEMDIE to all ->mm != NULL threads which are killed by the OOM killer because doing so won't increase the risk of depleting the memory reserves, for TIF_MEMDIE helps only if that thread is doing memory allocation ( http://lkml.kernel.org/r/1458529634-5951-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp ), but it did not happen.
And in any case I don't understand this patch but I have to admit that I failed to force myself to read the changelog and the actual change ;) In any case I agree that we should not set MMF_MEMDIE if ->mm == NULL, and if we ensure this then I do not understand why we can't rely on MMF_OOM_REAPED. Ignoring the obvious races, if ->oom_victims != 0 then find_lock_task_mm() should succed.
Since we are using mm = current->mm; current->mm = NULL; __mmput(mm); (may block for unbounded period waiting for somebody else's memory allocation) exit_oom_victim(current); sequence, we won't be able to make find_lock_task_mm(tsk) != NULL when tsk->signal->oom_victims != 0 unless we change this sequence. My patch tries to rescue it using tsk->signal->oom_ignore_victims flag.
Oleg.
-- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>