Re: [RFC PATCH 2/2] mm,oom: Try last second allocation after selecting an OOM victim.

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: 2017-08-24 15:51:53

Possibly related (same subject, not in this thread)

2017-10-25 · Re: [RFC PATCH 2/2] mm,oom: Try last second allocation after selecting an OOM victim. · Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2017-10-25 · Re: [RFC PATCH 2/2] mm,oom: Try last second allocation after selecting an OOM victim. · Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2017-10-25 · Re: [RFC PATCH 2/2] mm,oom: Try last second allocation after selecting an OOM victim. · Michal Hocko <mhocko@suse.com>
2017-10-25 · Re: [RFC PATCH 2/2] mm,oom: Try last second allocation after selecting an OOM victim. · Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2017-10-25 · Re: [RFC PATCH 2/2] mm,oom: Try last second allocation after selecting an OOM victim. · Michal Hocko <mhocko@suse.com>

Michal Hocko wrote:

On Thu 24-08-17 21:18:26, Tetsuo Handa wrote:

quoted

Manish Jaggi noticed that running LTP oom01/oom02 ltp tests with high core
count causes random kernel panics when an OOM victim which consumed memory
in a way the OOM reaper does not help was selected by the OOM killer [1].

Since commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
oom_reaped tasks") changed task_will_free_mem(current) in out_of_memory()
to return false as soon as MMF_OOM_SKIP is set, many threads sharing the
victim's mm were not able to try allocation from memory reserves after the
OOM reaper gave up reclaiming memory.

I proposed a patch which alllows task_will_free_mem(current) in
out_of_memory() to ignore MMF_OOM_SKIP for once so that all OOM victim
threads are guaranteed to have tried ALLOC_OOM allocation attempt before
start selecting next OOM victims [2], for Michal Hocko did not like
calling get_page_from_freelist() from the OOM killer which is a layer
violation [3]. But now, Michal thinks that calling get_page_from_freelist()
after task_will_free_mem(current) test is better than allowing
task_will_free_mem(current) to ignore MMF_OOM_SKIP for once [4], for
this would help other cases when we race with an exiting tasks or somebody
managed to free memory while we were selecting an OOM victim which can take
quite some time.

This a lot of text which can be more confusing than helpful. Could you
state the problem clearly without detours? Yes, the oom killer selection
can race with those freeing memory. And it has been like that since
basically ever.

The problem which Manish Jaggi reported (and I can still reproduce) is that
the OOM killer ignores MMF_OOM_SKIP mm too early. And the problem became real
in 4.8 due to commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
oom_reaped tasks"). Thus, it has _not_ been like that since basically ever.

                Doing a last minute allocation attempt might help. Now
there are more important questions. How likely is that. Do people have
to care? __alloc_pages_may_oom already does a almost-the-last moment
allocation. Do we still need it?

get_page_from_freelist() in __alloc_pages_may_oom() would help only if
MMF_OOM_SKIP is set after some memory is reclaimed. But the problem is
that MMF_OOM_SKIP is set without reclaiming any memory.

                                 It also does ALLOC_WMARK_HIGH
allocation which your path doesn't do.

The intent of this patch is to replace "[PATCH v2] mm, oom:
task_will_free_mem(current) should ignore MMF_OOM_SKIP for once."
which you have nacked 3 days ago.

                                       I wanted to remove this some time
ago but it has been pointed out that this was really needed
https://patchwork.kernel.org/patch/8153841/ Maybe things have changed
and if so please explain.

get_page_from_freelist() in __alloc_pages_may_oom() will remain needed
because it can help allocations which do not call oom_kill_process() (i.e.
allocations which do "goto out;" in __alloc_pages_may_oom() without calling
out_of_memory(), and allocations which do "return;" in out_of_memory()
without calling oom_kill_process() (e.g. !__GFP_FS)) to succeed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help