Re: [PATCH] aio: Add memcg accounting of user used data

From: Michal Hocko <mhocko@kernel.org>
Date: 2017-12-05 15:43:10
Also in: lkml

On Tue 05-12-17 18:34:59, Kirill Tkhai wrote:

On 05.12.2017 18:15, Michal Hocko wrote:

quoted

On Tue 05-12-17 13:00:54, Kirill Tkhai wrote:

quoted

Currently, number of available aio requests may be
limited only globally. There are two sysctl variables
aio_max_nr and aio_nr, which implement the limitation
and request accounting. They help to avoid
the situation, when all the memory is eaten in-flight
requests, which are written by slow block device,
and which can't be reclaimed by shrinker.

This meets the problem in case of many containers
are used on the hardware node. Since aio_max_nr is
a global limit, any container may occupy the whole
available aio requests, and to deprive others the
possibility to use aio at all. The situation may
happen because of evil intentions of the container's
user or because of the program error, when the user
makes this occasionally

The patch allows to fix the problem. It adds memcg
accounting of user used aio data (the biggest is
the bunch of aio_kiocb; ring buffer is the second
biggest), so a user of a certain memcg won't be able
to allocate more aio requests memory, then the cgroup
allows, and he will bumped into the limit.

So what happens when we hit the hard limit and oom kill somebody?
Are those charged objects somehow bound to a process context?

There is exit_aio() called from __mmput(), which waits till
the charged objects complete and decrement reference counter.

OK, so it is bound to _a_ process context. The oom killer will not know
about which process has consumed those objects but the effect will be at
least reduced to a memcg.

If there was a problem with oom in memcg, there would be
the same problem on global oom, as it can be seen there is
no __GFP_NOFAIL flags anywhere in aio code.

But it seems everything is safe.

Could you share your testing scenario and the way how the system behaved
during a heavy aio?

I am not saying the patch is wrong, I am just trying to undestand all
the consequences.
-- 
Michal Hocko
SUSE Labs

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help