Re: [PATCH] aio: Add memcg accounting of user used data
From: Michal Hocko <mhocko@kernel.org>
Date: 2017-12-05 15:43:10
Also in:
lkml
On Tue 05-12-17 18:34:59, Kirill Tkhai wrote:
On 05.12.2017 18:15, Michal Hocko wrote:quoted
On Tue 05-12-17 13:00:54, Kirill Tkhai wrote:quoted
Currently, number of available aio requests may be limited only globally. There are two sysctl variables aio_max_nr and aio_nr, which implement the limitation and request accounting. They help to avoid the situation, when all the memory is eaten in-flight requests, which are written by slow block device, and which can't be reclaimed by shrinker. This meets the problem in case of many containers are used on the hardware node. Since aio_max_nr is a global limit, any container may occupy the whole available aio requests, and to deprive others the possibility to use aio at all. The situation may happen because of evil intentions of the container's user or because of the program error, when the user makes this occasionally The patch allows to fix the problem. It adds memcg accounting of user used aio data (the biggest is the bunch of aio_kiocb; ring buffer is the second biggest), so a user of a certain memcg won't be able to allocate more aio requests memory, then the cgroup allows, and he will bumped into the limit.So what happens when we hit the hard limit and oom kill somebody? Are those charged objects somehow bound to a process context?There is exit_aio() called from __mmput(), which waits till the charged objects complete and decrement reference counter.
OK, so it is bound to _a_ process context. The oom killer will not know about which process has consumed those objects but the effect will be at least reduced to a memcg.
If there was a problem with oom in memcg, there would be the same problem on global oom, as it can be seen there is no __GFP_NOFAIL flags anywhere in aio code. But it seems everything is safe.
Could you share your testing scenario and the way how the system behaved during a heavy aio? I am not saying the patch is wrong, I am just trying to undestand all the consequences. -- Michal Hocko SUSE Labs