Re: [v6 2/4] mm, oom: cgroup-aware OOM killer
From: Roman Gushchin <hidden>
Date: 2017-08-24 13:59:16
Also in:
linux-mm, lkml
On Thu, Aug 24, 2017 at 02:58:11PM +0200, Michal Hocko wrote:
On Thu 24-08-17 13:28:46, Roman Gushchin wrote:quoted
Hi Michal!There is nothing like a "better victim". We are pretty much in a catastrophic situation when we try to survive by killing a userspace.
Not necessary, it can be a cgroup OOM.
We try to kill the largest because that assumes that we return the most memory from it. Now I do understand that you want to treat the memcg as a single killable entity but I find it really questionable to do a per-memcg metric and then do not treat it like that and kill only a single task. Just imagine a single memcg with zillions of taks each very small and you select it as the largest while a small taks itself doesn't help to help to get us out of the OOM.
I don't think it's different from a non-containerized state: if you have a zillion of small tasks in the system, you'll meet the same issues.
quoted
quoted
I guess I have asked already and we haven't reached any consensus. I do not like how you treat memcgs and tasks differently. Why cannot we have a memcg score a sum of all its tasks?It sounds like a more expensive way to get almost the same with less accuracy. Why it's better?because then you are comparing apples to apples?
Well, I can say that I compare some number of pages against some other number of pages. And the relation between a page and memcg is more obvious, than a relation between a page and a process. Both ways are not ideal, and sum of the processes is not ideal too. Especially, if you take oom_score_adj into account. Will you respect it? I've started actually with such approach, but then found it weird.
Besides that you have to check each task for over-killing anyway. So I do not see any performance merits here.
It's an implementation detail, and we can hopefully get rid of it at some point.
quoted
quoted
How do you want to compare memcg score with tasks score?I have to do it for tasks in root cgroups, but it shouldn't be a common case.How come? I can easily imagine a setup where only some memcgs which really do need a kill-all semantic while all others can live with single task killed perfectly fine.
I mean taking a unified cgroup hierarchy into an account, there should not be lot of tasks in the root cgroup, if any.