Thread (11 messages) 11 messages, 4 authors, 2017-08-14

Re: [v4 2/4] mm, oom: cgroup-aware OOM killer

From: Roman Gushchin <hidden>
Date: 2017-08-01 18:14:29
Also in: linux-mm, lkml

On Tue, Aug 01, 2017 at 07:03:03PM +0200, Michal Hocko wrote:
On Tue 01-08-17 16:25:48, Roman Gushchin wrote:
quoted
On Tue, Aug 01, 2017 at 04:54:35PM +0200, Michal Hocko wrote:
[...]
quoted
quoted
I would reap out the oom_kill_process into a separate patch.
It was a separate patch, I've merged it based on Vladimir's feedback.
No problems, I can divide it back.
It would make the review slightly more easier
quoted
quoted
quoted
-static void oom_kill_process(struct oom_control *oc, const char *message)
+static void __oom_kill_process(struct task_struct *victim)
To the rest of the patch. I have to say I do not quite like how it is
implemented. I was hoping for something much simpler which would hook
into oom_evaluate_task. If a task belongs to a memcg with kill-all flag
then we would update the cumulative memcg badness (more specifically the
badness of the topmost parent with kill-all flag). Memcg will then
compete with existing self contained tasks (oom_badness will have to
tell whether points belong to a task or a memcg to allow the caller to
deal with it). But it shouldn't be much more complex than that.
I'm not sure, it will be any simpler. Basically I'm doing the same:
the difference is that you want to iterate over tasks and for each
task traverse the memcg tree, update per-cgroup oom score and find
the corresponding memcg(s) with the kill-all flag. I'm doing the opposite:
traverse the cgroup tree, and for each leaf cgroup iterate over processes.
Yeah but this doesn't fit very well to the existing scheme so we would
need two different schemes which is not ideal from maint. point of view.
We also do not have to duplicate all the tricky checks we already do in
oom_evaluate_task. So I would prefer if we could try to hook there and
do the special handling there.
I hope, that iterating over all tasks just to check if there are
in-flight OOM victims might be optimized at some point.
That means, we would be able to choose a victim much cheaper.
It's not easy, but it feels as a right direction to go.

Also, adding new tricks to the oom_evaluate_task() will make the code
even more hairy. Some of the existing tricks are useless for memcg selection.
quoted
Also, please note, that even without the kill-all flag the decision is made
on per-cgroup level (except tasks in the root cgroup).
Yeah and I am not sure this is a reasonable behavior. Why should we
consider memcgs which are not kill-all as a single entity?
I think, it's reasonable to choose a cgroup/container to blow off based on
the cgroup oom_priority/size (including hierarchical settings), and then
kill one biggest or all tasks depending on cgroup settings.

Thanks!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help