Re: [PATCH v3 2/4] mm/oom: handle remote ooms

From: Michal Hocko <hidden>
Date: 2021-11-16 09:39:13
Also in: linux-fsdevel, linux-mm

On Tue 16-11-21 10:28:25, Michal Hocko wrote:

On Mon 15-11-21 16:58:19, Mina Almasry wrote:

[...]

quoted

To be honest I think this is very workable, as is Shakeel's suggestion
of MEMCG_OOM_NO_VICTIM. Since this is an opt-in feature, we can
document the behavior and if the userspace doesn't want to get killed
they can catch the sigbus and handle it gracefully. If not, the
userspace just gets killed if we hit this edge case.

I am not sure about the MEMCG_OOM_NO_VICTIM approach. It sounds really
hackish to me. I will get back to Shakeel's email as time permits. The
primary problem I have with this, though, is that the kernel oom killer
cannot really do anything sensible if the limit is reached and there
is nothing reclaimable left in this case. The tmpfs backed memory will
simply stay around and there are no means to recover without userspace
intervention.

And just a small clarification. Tmpfs is fundamentally problematic from
the OOM handling POV. The nuance here is that the OOM happens in a
different memcg and thus a different resource domain. If you kill a task
in the target memcg then you effectively DoS that workload. If you kill
the allocating task then it is DoSed by anybody allowed to write to that
shmem. All that without a graceful fallback.

I still have very hard time seeing how that can work reasonably except
for a very special case with a lot of other measures to ensure the
target memcg never hits the hard limit so the OOM simply is not a
problem.

Memory controller has always been used to enforce and balance memory
usage among resource domains and this goes against that principle.
I would be really curious what Johannes thinks about this.
-- 
Michal Hocko
SUSE Labs

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help