Thread (38 messages) 38 messages, 3 authors, 2018-09-07

Re: [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes.

From: Michal Hocko <mhocko@kernel.org>
Date: 2018-08-10 11:16:06

On Fri 10-08-18 19:54:39, Tetsuo Handa wrote:
On 2018/08/10 18:07, Michal Hocko wrote:
quoted
quoted
Yes, of course, it would be easy to rely on exit_mmap() to set 
MMF_OOM_SKIP itself and have the oom reaper drop the task from its list 
when we are assured of forward progress.  What polling are you proposing 
other than a timeout based mechanism to do this?
I was thinking about doing something like the following
- oom_reaper checks the amount of victim's memory after it is done with
  reaping (e.g. by calling oom_badness before and after). 
OK. We can apply

+static inline unsigned long oom_victim_mm_score(struct mm_struct *mm)
+{
+	return get_mm_rss(mm) + get_mm_counter(mm, MM_SWAPENTS) +
+		mm_pgtables_bytes(mm) / PAGE_SIZE;
+}

and

-	points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
-		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
+	points = oom_victim_mm_score(p->mm);

part, can't we?
quoted
                                                          If it wasn't able to
  reclaim much then return false and keep retrying with the existing
  mechanism
How do you decide whether oom_reaper() was not able to reclaim much?
Just a rule of thumb. If it freed at least few kBs then we should be good
to MMF_OOM_SKIP.
 
quoted
- once a flag (e.g. MMF_OOM_MMAP) is set it bails out and won't set the
  MMF_OOM_SKIP flag.
Unless oom_victim_mm_score() becomes close to 0, setting MMF_OOM_SKIP is
considered premature. oom_reaper() will have to keep retrying....
there absolutely have to be a cap for retrying. Otherwise you have
lockup scenarios back when the memory is mostly consumed by page tables.
quoted
quoted
We could set a MMF_EXIT_MMAP in exit_mmap() to specify that it will 
complete free_pgtables() for that mm.  The problem is the same: when does 
the oom reaper decide to set MMF_OOM_SKIP because MMF_EXIT_MMAP has not 
been set in a timely manner?
reuse the current retry policy which is the number of attempts rather
than any timeout.
And this is really I can't understand. The number of attempts multiplied
by retry interval _is_ nothing but timeout.
Yes it is a timeout but it is not the time that matters. It is that we
have tried sufficient times. Looks at it this way. You can retry 5 times
in 10s or just once. Depending on what is going on in the system. I
would really prefer the behavior to be deterministic.
We are already using timeout based decision, with some attempt to reclaim
memory if conditions are met.
Timeout based decision is when you, well, make a decision after a
certain time passes. And we do not do that.
 
quoted
quoted
If this is an argument that the oom reaper should loop checking for 
MMF_EXIT_MMAP and doing schedule_timeout(1) a set number of times rather 
than just setting the jiffies in the mm itself, that's just implementing 
the same thing and doing so in a way where the oom reaper stalls operating 
on a single mm rather than round-robin iterating over mm's in my patch.
I've said earlier that I do not mind doing round robin in the oom repaer
but this is certainly more complex than what we do now and I haven't
seen any actual example where it would matter. OOM reaper is a safely
measure. Nothing should fall apart if it is slow. 
The OOM reaper can fail if allocating threads have high priority. You seem to
assume that realtime threads won't trigger OOM path. But since !PF_WQ_WORKER
threads do only cond_resched() due to your "the cargo cult programming" refusal,
and like Andrew Morton commented

  cond_resched() is a no-op in the presence of realtime policy threads
  and using to attempt to yield to a different thread it in this fashion
  is broken.

at "mm: disable preemption before swapcache_free" thread, we can't guarantee
that allocating threads shall give the OOM reaper enough CPU resource for
making forward progress. And my direct OOM reaping proposal was also refused
by you. I really dislike counting OOM reaper as a safety measure.
Well, yeah, you can screw up your system with real time priority tasks
all you want. I really fail to see why you are bringing that up now
though. Yet another offtopic?

-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help