Re: [PATCH 08/10] exit, oom: postpone exit_oom_victim to later
From: Michal Hocko <mhocko@kernel.org>
Date: 2016-08-02 11:31:32
On Tue 02-08-16 19:32:45, Tetsuo Handa wrote:
Michal Hocko wrote:quoted
quoted
quoted
quoted
It is possible that a user creates a process with 10000 threads and let that process be OOM-killed. Then, this patch allows 10000 threads to start consuming memory reserves after they left exit_mm(). OOM victims are not the only threads who need to allocate memory for termination. Non OOM victims might need to allocate memory at exit_task_work() in order to allow OOM victims to make forward progress.this might be possible but unlike the regular exiting tasks we do reclaim oom victim's memory in the background. So while they can consume memory reserves we should also give some (and arguably much more) memory back. The reserves are there to expedite the exit.Background reclaim does not occur on CONFIG_MMU=n kernels. But this patch also affects CONFIG_MMU=n kernels. If a process with two threads was OOM-killed and one thread consumed too much memory after it left exit_mm() before the other thread sets MMF_OOM_SKIP on their mm by returning from exit_aio() etc. in __mmput() from mmput() from exit_mm(), this patch introduces a new possibility to OOM livelock. I think it is wild to assume that "CONFIG_MMU=n kernels can OOM livelock even without this patch. Thus, let's apply this patch even though this patch might break the balance of OOM handling in CONFIG_MMU=n kernels."As I've said if you have strong doubts about the patch I can drop it for now. I do agree that nommu really matters here, though.OK. Then, for now let's postpone only the oom_killer_disbale() to later rather than postpone the exit_oom_victim() to later.
that would require other changes (basically make oom_killer_disbale independent on TIF_MEMDIE) which I think doesn't belong to this pile. So I would rather sacrifice this patch instead and it will not be part of the v2. [...]
quoted
quoted
quoted
quoted
I think that allocations from do_exit() are important for terminating cleanly (from the point of view of filesystem integrity and kernel object management) and such allocations should not be given up simply because ALLOC_NO_WATERMARKS allocations failed.We are talking about a fatal condition when OOM killer forcefully kills a task. Chances are that the userspace leaves so much state behind that a manual cleanup would be necessary anyway. Depleting the memory reserves is not nice but I really believe that this particular patch doesn't make the situation really much worse than before.I'm not talking about inconsistency in userspace programs. I'm talking about inconsistency of objects managed by kernel (e.g. failing to drop references) caused by allocation failures.That would be a bug on its own, no?Right, but memory allocations after exit_mm() from do_exit() (e.g. exit_task_work()) might assume (or depend on) the "too small to fail" memory-allocation rule where small GFP_FS allocations won't fail unless TIF_MEMDIE is set, but this patch can unexpectedly break that rule if they assume (or depend on) that rule.
Silent dependency on nofail semantic withtou GFP_NOFAIL is still a bug. Full stop. I really fail to see why you are still arguing about that. [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>