Thread (38 messages) 38 messages, 3 authors, 2018-09-07

Re: [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes.

From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: 2018-08-19 14:23:59

On 2018/08/14 20:33, Michal Hocko wrote:
On Sat 11-08-18 12:12:52, Tetsuo Handa wrote:
quoted
On 2018/08/10 20:16, Michal Hocko wrote:
quoted
quoted
How do you decide whether oom_reaper() was not able to reclaim much?
Just a rule of thumb. If it freed at least few kBs then we should be good
to MMF_OOM_SKIP.
I don't think so. We are talking about situations where MMF_OOM_SKIP is set
before memory enough to prevent the OOM killer from selecting next OOM victim
was reclaimed.
There is nothing like enough memory to prevent a new victim selection.
Just think of streaming source of allocation without any end. There is
simply no way to tell that we have freed enough. We have to guess and
tune based on reasonable workloads.
I'm not talking about "allocation without any end" case.
We already inserted fatal_signal_pending(current) checks (except vmalloc()
where tsk_is_oom_victim(current) would be used instead).

What we are talking about is a situation where we could avoid selecting next
OOM victim if we waited for some more time after MMF_OOM_SKIP was set.
quoted
Apart from the former is "sequential processing" and "the OOM reaper pays the cost
for reclaiming" while the latter is "parallel (or round-robin) processing" and "the
allocating thread pays the cost for reclaiming", both are timeout based back off
with number of retry attempt with a cap.
And it is exactly the who pays the price concern I've already tried to
explain that bothers me.
Are you aware that we can fall into situation where nobody can pay the price for
reclaiming memory?
I really do not see how making the code more complex by ensuring that
allocators share a fair part of the direct oom repaing will make the
situation any easier.
You are completely ignoring/misunderstanding the background of
commit 9bfe5ded054b8e28 ("mm, oom: remove sleep from under oom_lock").

That patch was applied in order to mitigate a lockup problem caused by the fact
that allocators can deprive the OOM reaper of all CPU resources for making progress
due to very very broken assumption at

        /*
         * Acquire the oom lock.  If that fails, somebody else is
         * making progress for us.
         */
        if (!mutex_trylock(&oom_lock)) {
                *did_some_progress = 1;
                schedule_timeout_uninterruptible(1);
                return NULL;
        }

on the allocator side.

Direct OOM reaping is a method for ensuring that allocators spend _some_ CPU
resources for making progress. I already showed how to prevent allocators from
trying to reclaim all (e.g. multiple TB) memory at once because you worried it.
                      Really there are basically two issues we really
should be after. Improve the oom reaper to tear down wider range of
memory (namely mlock) and to improve the cooperation with the exit path
to handle free_pgtables more gracefully because it is true that some
processes might really consume a lot of memory in page tables without
mapping  a lot of anonymous memory. Neither of the two is addressed by
your proposal. So if you want to help then try to think about the two
issues.
Your "improvement" is to tear down wider range of memory whereas
my "improvement" is to ensure that CPU resource is spent for reclaiming memory and
David's "improvement" is to mitigate unnecessary killing of additional processes.
Therefore, your "Neither of the two is addressed by your proposal." is pointless.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help