Thread (14 messages) 14 messages, 4 authors, 2016-07-14

Re: [PATCH] mm: memcontrol: fix cgroup creation failure after many small jobs

From: Johannes Weiner <hidden>
Date: 2016-06-17 16:43:19
Also in: linux-mm, lkml

On Fri, Jun 17, 2016 at 12:06:55PM +0300, Vladimir Davydov wrote:
On Wed, Jun 15, 2016 at 11:42:44PM -0400, Johannes Weiner wrote:
quoted
The memory controller has quite a bit of state that usually outlives
the cgroup and pins its CSS until said state disappears. At the same
time it imposes a 16-bit limit on the CSS ID space to economically
store IDs in the wild. Consequently, when we use cgroups to contain
frequent but small and short-lived jobs that leave behind some page
cache, we quickly run into the 64k limitations of outstanding CSSs.
Creating a new cgroup fails with -ENOSPC while there are only a few,
or even no user-visible cgroups in existence.

Although pinning CSSs past cgroup removal is common, there are only
two instances that actually need a CSS ID after a cgroup is deleted:
cache shadow entries and swapout records.

Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.

Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after a process exits are tmpfs/shmem pages. Those
references are under the user's control and thus manageable.

This patch introduces a private 16bit memcg ID and switches swap and
cache shadow entries over to using that. It then decouples the CSS
lifetime from the CSS ID lifetime, such that a CSS ID can be recycled
when the CSS is only pinned by common objects that don't need an ID.
There's already id which is only used for online memory cgroups - it's
kmemcg_id. May be, instead of introducing one more idr, we could name it
generically and reuse it for shadow entries?
Good point. But it seems mem_cgroup_idr is more generic, it makes
sense to switch slab accounting over to that. I'll look into that, but
as a refactoring patch on top of this fix.
Regarding swap entries, would it really make much difference if we used
4 bytes per swap page instead of 2? For a 100 GB swap it'd increase
overhead from 50 MB up to 100 MB, which still doesn't seem too much IMO,
so may be just use plain unrestricted css->id for swap entries?
Yes and no. I agree that the increased consumption wouldn't be too
crazy, but if we have to maintain a 16-bit ID anyway, we might as well
use it for swap too to save that space. I don't think tmpfs and shmem
pins past offlining will be common enough to significantly eat into
the ID space of online cgroups.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help