Re: [PATCH v4 04/14] kmem accounting basic infrastructure
From: Glauber Costa <hidden>
Date: 2012-10-12 07:36:51
Also in:
linux-mm, lkml
On 10/11/2012 02:11 PM, Michal Hocko wrote:
On Mon 08-10-12 14:06:10, Glauber Costa wrote:quoted
This patch adds the basic infrastructure for the accounting of the slab caches. To control that, the following files are created: * memory.kmem.usage_in_bytes * memory.kmem.limit_in_bytes * memory.kmem.failcnt * memory.kmem.max_usage_in_bytes They have the same meaning of their user memory counterparts. They reflect the state of the "kmem" res_counter. Per cgroup slab memory accounting is not enabled until a limit is sets/slab/kmem/ right?
right.
quoted
+static int memcg_update_kmem_limit(struct cgroup *cont, u64 val) +{ + int ret = -EINVAL; +#ifdef CONFIG_MEMCG_KMEM + struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); + /* + * For simplicity, we won't allow this to be disabled. It also can't + * be changed if the cgroup has children already, or if tasks had + * already joined. + * + * If tasks join before we set the limit, a person looking at + * kmem.usage_in_bytes will have no way to determine when it took + * place, which makes the value quite meaningless. + * + * After it first became limited, changes in the value of the limit are + * of course permitted. + * + * Taking the cgroup_lock is really offensive, but it is so far the only + * way to guarantee that no children will appear. There are plenty of + * other offenders, and they should all go away. Fine grained locking + * is probably the way to go here. When we are fully hierarchical, we + * can also get rid of the use_hierarchy check. + */ + cgroup_lock(); + mutex_lock(&set_limit_mutex); + if (!memcg->kmem_accounted && val != RESOURCE_MAX) {Just a nit but wouldn't memcg_kmem_is_accounted(memcg) be better than directly checking kmem_accounted? Besides that I am not sure I fully understand RESOURCE_MAX test. Say I want to have kmem accounting for monitoring so I do echo -1 > memory.kmem.limit_in_bytes so you set the value but do not activate it. Isn't this just a reminder from the time when the accounting could be deactivated?
No, not at all. I see you have talked about that in other e-mails, (I was on sick leave yesterday), so let me consolidate it all here: What we discussed before, regarding to echo -1 > ... was around the disable code, something that we no longer allow. So now, if you will echo -1 to that file *after* it is limited, you get in track only mode. But for you to start that, you absolutely have to write something different than -1. Just one example: libcgroup, regardless of how lame we think it is in this regard, will write to all cgroup files by default when a file is updated. If you haven't written anything, it will still write the same value that the file had before. This means that an already deployed libcg-managed installation will suddenly enable kmem for every cgroup. Sure this can be fixed in userspace, but: 1) There is no reason to break it, if we can 2) It is perfectly reasonable to expect that if you write to a file the same value that was already there, nothing happens. I'll update the docs to say that you can just write -1 *after* it is limited, but i believe enabling it has to be a very clear transition, for sanity's sake.
quoted
+ if (cgroup_task_count(cont) || (memcg->use_hierarchy && + !list_empty(&cont->children))) { + ret = -EBUSY; + goto out; + } + ret = res_counter_set_limit(&memcg->kmem, val);VM_BUG_IN(ret) ? There shouldn't be any usage when you enable it or something bad is going on.
Good point, this is indeed an impossible scenario I was just being overcautious about.