Thread (113 messages) 113 messages, 8 authors, 2012-08-22

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

From: Ying Han <hidden>
Date: 2012-08-15 20:20:18
Also in: linux-mm, lkml

On Tue, Aug 14, 2012 at 9:21 AM, Michal Hocko [off-list ref] wrote:
On Thu 09-08-12 17:01:12, Glauber Costa wrote:
quoted
This patch adds the basic infrastructure for the accounting of the slab
caches. To control that, the following files are created:

 * memory.kmem.usage_in_bytes
 * memory.kmem.limit_in_bytes
 * memory.kmem.failcnt
 * memory.kmem.max_usage_in_bytes

They have the same meaning of their user memory counterparts. They
reflect the state of the "kmem" res_counter.

The code is not enabled until a limit is set. This can be tested by the
flag "kmem_accounted". This means that after the patch is applied, no
behavioral changes exists for whoever is still using memcg to control
their memory usage.

We always account to both user and kernel resource_counters. This
effectively means that an independent kernel limit is in place when the
limit is set to a lower value than the user memory. A equal or higher
value means that the user limit will always hit first, meaning that kmem
is effectively unlimited.
Well, it contributes to the user limit so it is not unlimited. It just
falls under a different limit and it tends to contribute less. This can
be quite confusing.  I am still not sure whether we should mix the two
things together. If somebody wants to limit the kernel memory he has to
touch the other limit anyway.  Do you have a strong reason to mix the
user and kernel counters?
The reason to mix the two together is a compromise of the two use
cases we've heard by far. In google, we only need one limit which
limits u & k, and the reclaim kicks in when the total usage hits the
limit.
My impression was that kernel allocation should simply fail while user
allocations might reclaim as well. Why should we reclaim just because of
the kernel allocation (which is unreclaimable from hard limit reclaim
point of view)?
Some of kernel objects are reclaimable if we have per-memcg shrinker.
I also think that the whole thing would get much simpler if those two
are split. Anyway if this is really a must then this should be
documented here.
What would be the use case you have in your end?

--Ying
One nit bellow.
quoted
People who want to track kernel memory but not limit it, can set this
limit to a very high number (like RESOURCE_MAX - 1page - that no one
will ever hit, or equal to the user memory)

Signed-off-by: Glauber Costa <redacted>
CC: Michal Hocko <redacted>
CC: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Kamezawa Hiroyuki <redacted>
---
 mm/memcontrol.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b0e29f4..54e93de 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
[...]
quoted
@@ -4046,8 +4059,23 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
                      break;
              if (type == _MEM)
                      ret = mem_cgroup_resize_limit(memcg, val);
-             else
+             else if (type == _MEMSWAP)
                      ret = mem_cgroup_resize_memsw_limit(memcg, val);
+             else if (type == _KMEM) {
+                     ret = res_counter_set_limit(&memcg->kmem, val);
+                     if (ret)
+                             break;
+                     /*
+                      * Once enabled, can't be disabled. We could in theory
+                      * disable it if we haven't yet created any caches, or
+                      * if we can shrink them all to death.
+                      *
+                      * But it is not worth the trouble
+                      */
+                     if (!memcg->kmem_accounted && val != RESOURCE_MAX)
+                             memcg->kmem_accounted = true;
+             } else
+                     return -EINVAL;
              break;
This doesn't check for the hierachy so kmem_accounted might not be in
sync with it's parents. mem_cgroup_create (below) needs to copy
kmem_accounted down from the parent and the above needs to check if this
is a similar dance like mem_cgroup_oom_control_write.

[...]
quoted
@@ -5033,6 +5098,7 @@ mem_cgroup_create(struct cgroup *cont)
      if (parent && parent->use_hierarchy) {
              res_counter_init(&memcg->res, &parent->res);
              res_counter_init(&memcg->memsw, &parent->memsw);
+             res_counter_init(&memcg->kmem, &parent->kmem);
              /*
               * We increment refcnt of the parent to ensure that we can
               * safely access it on res_counter_charge/uncharge.
@@ -5043,6 +5109,7 @@ mem_cgroup_create(struct cgroup *cont)
      } else {
              res_counter_init(&memcg->res, NULL);
              res_counter_init(&memcg->memsw, NULL);
+             res_counter_init(&memcg->kmem, NULL);
      }
      memcg->last_scanned_node = MAX_NUMNODES;
      INIT_LIST_HEAD(&memcg->oom_notify);
--
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help