Thread (22 messages) 22 messages, 4 authors, 2017-12-07

Re: [PATCH 0/5] blkcg: Limit maximum number of aio requests available for cgroup

From: Kirill Tkhai <hidden>
Date: 2017-12-04 22:49:52
Also in: lkml

On 05.12.2017 00:52, Tejun Heo wrote:
Hello, Kirill.

On Tue, Dec 05, 2017 at 12:44:00AM +0300, Kirill Tkhai wrote:
quoted
quoted
Can you please explain how this is a fundamental resource which can't
be controlled otherwise?
Currently, aio_nr and aio_max_nr are global. In case of containers this
means that a single container may occupy all aio requests, which are
available in the system, and to deprive others possibility to use aio
at all. This may happen because of evil intentions of the container's
user or because of the program error, when the user makes this occasionally.
Hmm... I see.  It feels really wrong to me to make this a first class
resource because there is a system wide limit.  The only reason I can
think of for the system wide limit is to prevent too much kernel
memory consumed by creating a lot of aios but that squarely falls
inside cgroup memory controller protection.  If there are other
reasons why the number of aios should be limited system-wide, please
bring them up.

If the only reason is kernel memory consumption protection, the only
thing we need to do is making sure that memory used for aio commands
are accounted against cgroup kernel memory consumption and
relaxing/removing system wide limit.
So, we just use GFP_KERNEL_ACCOUNT flag for allocation of internal aio
structures and pages, and all the memory will be accounted in kmem and
limited by memcg. Looks very good.

One detail about memory consumption. io_submit() calls primitives
file_operations::write_iter and read_iter. It's not clear for me whether
they consume the same memory as if writev() or readv() system calls
would be used instead. writev() may delay the actual write till dirty
pages limit will be reached, so it seems logic of the accounting should
be the same. So aio mustn't use more not accounted system memory in file
system internals, then simple writev().

Could you please to say if you have thoughts about this?

Kirill
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help