Re: [PATCH 00/10] memcg: Kernel Memory Accounting.

[PATCH 00/10] memcg: Kernel Memory Accounting. · Suleiman Souhlal <hidden> · 2012-02-27
[PATCH 01/10] memcg: Kernel memory accounting infrastructure. · Suleiman Souhlal <hidden> · 2012-02-27
Re: [PATCH 01/10] memcg: Kernel memory accounting infrastructure. · Glauber Costa <hidden> · 2012-02-28
Re: [PATCH 01/10] memcg: Kernel memory accounting infrastructure. · Suleiman Souhlal <hidden> · 2012-02-29
Re: [PATCH 01/10] memcg: Kernel memory accounting infrastructure. · Glauber Costa <hidden> · 2012-02-28
[PATCH 03/10] memcg: Reclaim when more than one page needed. · Suleiman Souhlal <hidden> · 2012-02-27
Re: [PATCH 03/10] memcg: Reclaim when more than one page needed. · KAMEZAWA Hiroyuki <hidden> · 2012-02-29
[PATCH 07/10] memcg: Stop res_counter underflows. · Suleiman Souhlal <hidden> · 2012-02-27
Re: [PATCH 07/10] memcg: Stop res_counter underflows. · Glauber Costa <hidden> · 2012-02-28
Re: [PATCH 07/10] memcg: Stop res_counter underflows. · Suleiman Souhlal <hidden> · 2012-02-28
Re: [PATCH 07/10] memcg: Stop res_counter underflows. · Glauber Costa <hidden> · 2012-02-29
Re: [PATCH 07/10] memcg: Stop res_counter underflows. · Suleiman Souhlal <hidden> · 2012-02-29
[PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Suleiman Souhlal <hidden> · 2012-02-27
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · KAMEZAWA Hiroyuki <hidden> · 2012-02-29
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Glauber Costa <hidden> · 2012-02-29
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Suleiman Souhlal <hidden> · 2012-02-29
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · KAMEZAWA Hiroyuki <hidden> · 2012-03-01
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Glauber Costa <hidden> · 2012-03-01
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · KAMEZAWA Hiroyuki <hidden> · 2012-03-01
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Glauber Costa <hidden> · 2012-03-03
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Suleiman Souhlal <hidden> · 2012-03-03
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Glauber Costa <hidden> · 2012-03-03
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Suleiman Souhlal <hidden> · 2012-03-04
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Glauber Costa <hidden> · 2012-03-06
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Suleiman Souhlal <hidden> · 2012-03-06
Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT. · Glauber Costa <hidden> · 2012-03-06
[PATCH 09/10] memcg: Per-memcg memory.kmem.slabinfo file. · Suleiman Souhlal <hidden> · 2012-02-27
[PATCH 05/10] memcg: Slab accounting. · Suleiman Souhlal <hidden> · 2012-02-27
Re: [PATCH 05/10] memcg: Slab accounting. · Glauber Costa <hidden> · 2012-02-28
Re: [PATCH 05/10] memcg: Slab accounting. · Suleiman Souhlal <hidden> · 2012-02-28
Re: [PATCH 05/10] memcg: Slab accounting. · Glauber Costa <hidden> · 2012-02-29
[PATCH 10/10] memcg: Document kernel memory accounting. · Suleiman Souhlal <hidden> · 2012-02-27
Re: [PATCH 10/10] memcg: Document kernel memory accounting. · Randy Dunlap <hidden> · 2012-02-27
[PATCH 02/10] memcg: Uncharge all kmem when deleting a cgroup. · Suleiman Souhlal <hidden> · 2012-02-27
Re: [PATCH 02/10] memcg: Uncharge all kmem when deleting a cgroup. · Glauber Costa <hidden> · 2012-02-28
Re: [PATCH 02/10] memcg: Uncharge all kmem when deleting a cgroup. · Suleiman Souhlal <hidden> · 2012-02-29
Re: [PATCH 02/10] memcg: Uncharge all kmem when deleting a cgroup. · Glauber Costa <hidden> · 2012-02-29
Re: [PATCH 02/10] memcg: Uncharge all kmem when deleting a cgroup. · KAMEZAWA Hiroyuki <hidden> · 2012-02-29
Re: [PATCH 02/10] memcg: Uncharge all kmem when deleting a cgroup. · Suleiman Souhlal <hidden> · 2012-02-29
[PATCH 08/10] memcg: Add CONFIG_CGROUP_MEM_RES_CTLR_KMEM_ACCT_ROOT. · Suleiman Souhlal <hidden> · 2012-02-27
Re: [PATCH 08/10] memcg: Add CONFIG_CGROUP_MEM_RES_CTLR_KMEM_ACCT_ROOT. · Glauber Costa <hidden> · 2012-02-28
Re: [PATCH 08/10] memcg: Add CONFIG_CGROUP_MEM_RES_CTLR_KMEM_ACCT_ROOT. · Suleiman Souhlal <hidden> · 2012-02-28
Re: [PATCH 08/10] memcg: Add CONFIG_CGROUP_MEM_RES_CTLR_KMEM_ACCT_ROOT. · KAMEZAWA Hiroyuki <hidden> · 2012-02-28
Re: [PATCH 08/10] memcg: Add CONFIG_CGROUP_MEM_RES_CTLR_KMEM_ACCT_ROOT. · Glauber Costa <hidden> · 2012-02-29
Re: [PATCH 08/10] memcg: Add CONFIG_CGROUP_MEM_RES_CTLR_KMEM_ACCT_ROOT. · Suleiman Souhlal <hidden> · 2012-02-29
[PATCH 06/10] memcg: Track all the memcg children of a kmem_cache. · Suleiman Souhlal <hidden> · 2012-02-27
Re: [PATCH 00/10] memcg: Kernel Memory Accounting. · Pekka Enberg <penberg@kernel.org> · 2012-02-28
Re: [PATCH 00/10] memcg: Kernel Memory Accounting. · Suleiman Souhlal <hidden> · 2012-02-28
Re: [PATCH 00/10] memcg: Kernel Memory Accounting. · Glauber Costa <hidden> · 2012-02-28
Re: [PATCH 00/10] memcg: Kernel Memory Accounting. · Suleiman Souhlal <hidden> · 2012-02-28
Re: [PATCH 00/10] memcg: Kernel Memory Accounting. · Glauber Costa <hidden> · 2012-02-29
Re: [PATCH 00/10] memcg: Kernel Memory Accounting. · Suleiman Souhlal <hidden> · 2012-02-29

From: Suleiman Souhlal <hidden>
Date: 2012-02-28 22:47:53
Also in: linux-mm

Hello,

On Tue, Feb 28, 2012 at 5:03 AM, Glauber Costa [off-list ref] wrote:

Hi,


On 02/27/2012 07:58 PM, Suleiman Souhlal wrote:

quoted

This patch series introduces kernel memory accounting to memcg.
It currently only accounts for slab.

It's very similar to the patchset I sent back in October, but
with a number of fixes and improvements.
There is also overlap with Glauber's patchset, but I decided to
send this because there are some pretty significant differences.


Indeed. I will comment in details at your patches. I hope we can reach an
agreement soon about the best way to move forward.

Thanks a lot.

quoted

Slab gets accounted per-page, by using per-cgroup kmem_caches that
get created the first time an allocation of that type is done by
that cgroup.


I don't particularly care, but doesn't that introduce potential high
latencies for task within the cgroup?

Having the cgroup creation taking longer to complete is one thing,
but having an object allocation taking a lot longer than it would
take otherwise can be detrimental to some users. Again, for my usecase, this
is not terribly important.

Given that this is only done for the first allocation of a particular
type in a cgroup, and that cache creation shouldn't be *that*
expensive, I don't really think this is a big concern, unless your
cgroups are *extremely* short-lived.

That said, if you really think this is a problem, it would be trivial
to change the code to always  do the creation of the per-cgroup
kmem_caches asynchronously.
We already to do this when the first allocation is not GFP_KERNEL: We
let the allocation use the regular cache (so it won't be accounted to
the cgroup), and defer the creation to a workqueue.

Maybe we can pick a couple of caches that we know to be of more importance,
(like the dcache), and create them at memcg create time ?

I don't understand. Are you suggesting pre-allocating some caches,
like dcache, at cgroup creation, and letting "less important" caches
get created on-demand?
I guess that would be possible, but I'm not sure it's really
necessary, given what I mentioned above.

quoted

The main difference with Glauber's patches is here: We try to
track all the slab allocations, while Glauber only tracks ones
that are explicitly marked.
We feel that it's important to track everything, because there
are a lot of different slab allocations that may use significant
amounts of memory, that we may not know of ahead of time.


Note that "explicitly marking" does not mean not tracking them all in the
end. I am just quite worried about having the heavy-caches like the dcache
lying around without a proper memcg-shrinker.

I agree that we definitely need memcg-aware shrinkers.
We have patches for those, but I didn't want to include them in this
patchset, so that we could concentrate on getting the accounting right
first.
Once we can reach consensus on how to do the accounting, I will be
happy to start discussing shrinking. :-)

quoted

A per-cgroup kmem_cache will appear in slabinfo named like its
original cache, with the cgroup's name in parenthesis.


There is another problem associated with names. You can use just the
trailing piece of the cgroup name, since it may lead to duplicates like
memory/foo and memory/bar/foo all named "dcache-foo" (which I believe is
what you do here, right?). But then you need to include the full path, and
can end up with some quite monstruous things that will sure render
/proc/slabinfo useless for human readers.

I hadn't considered the fact that two cgroups could have the same base name.
I think having a name makes it a lot easier for a human to understand
which cgroup is using slab, so what about having both the base name of
the cgroup AND its css id, so that the caches are named like
"dentry(5:foo)"?
This should let us use the name of the cgroup while still being able
to distiguish cgroups that have the same base name.

I was thinking: How about we don't bother to show them at all, and instead,
show a proc-like file inside the cgroup with information about that cgroup?

One of the patches in the series adds a per-memcg memory.kmem.slabinfo.

quoted

       - There is a difference with non-memcg slab allocation in
         that with this, some slab allocations might fail when
         they never failed before. For example, a GFP_NOIO slab
         allocation wouldn't fail, but now it might.
         We have a change that makes slab accounting behave
         the same as non-accounted allocations, but I wasn't sure
         how important it was to include.

How about we don't account them, then ?

The patch we have (but I haven't included) makes it so that we account
for them most of the times. However, when we are in OOM conditions,
they get bypassed to the root (or not accounted for at all, if that's
what you prefer).
I guess I will make sure the patch is included in the v2 of this series.


Thanks for your comments.
-- Suleiman

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help