Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs

From: Luis Chamberlain <mcgrof@kernel.org>
Date: 2019-06-27 21:25:39
Also in: cgroups, linux-fsdevel, linux-mm, lkml

On Thu, Jun 27, 2019 at 04:57:50PM -0400, Waiman Long wrote:

On 6/26/19 4:19 PM, Roman Gushchin wrote:

quoted

 
+#ifdef CONFIG_MEMCG_KMEM
+static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg,
+				    void __maybe_unused *arg)
+{
+	struct kmem_cache *s;
+
+	if (memcg == root_mem_cgroup)
+		return;
+	mutex_lock(&slab_mutex);
+	list_for_each_entry(s, &memcg->kmem_caches,
+			    memcg_params.kmem_caches_node) {
+		kmem_cache_shrink(s);
+	}
+	mutex_unlock(&slab_mutex);
+	cond_resched();
+}

A couple of questions:
1) how about skipping already offlined kmem_caches? They are already shrunk,
   so you probably won't get much out of them. Or isn't it true?

I have been thinking about that. This patch is based on the linux tree
and so don't have an easy to find out if the kmem caches have been
shrinked. Rebasing this on top of linux-next, I can use the
SLAB_DEACTIVATED flag as a marker for skipping the shrink.

With all the latest patches, I am still seeing 121 out of a total of 726
memcg kmem caches (1/6) that are deactivated caches after system bootup
one of the test systems. My system is still using cgroup v1 and so the
number may be different in a v2 setup. The next step is probably to
figure out why those deactivated caches are still there.

quoted

2) what's your long-term vision here? do you think that we need to shrink
   kmem_caches periodically, depending on memory pressure? how a user
   will use this new sysctl?

Shrinking the kmem caches under extreme memory pressure can be one way
to free up extra pages, but the effect will probably be temporary.

quoted

What's the problem you're trying to solve in general?

At least for the slub allocator, shrinking the caches allow the number
of active objects reported in slabinfo to be more accurate. In addition,
this allow to know the real slab memory consumption. I have been working
on a BZ about continuous memory leaks with a container based workloads.

So.. this is still a work around?

The ability to shrink caches allow us to get a more accurate memory
consumption picture. Another alternative is to turn on slub_debug which
will then disables all the per-cpu slabs.

So this is a debugging mechanism?

Anyway, I think this can be useful to others that is why I posted the patch.

Since this is debug stuff, please add this to /proc/sys/debug/ instead.
That would reflect the intention, and would avoid the concern that folks
in production would use these things.

Since we only have 2 users of /proc/sys/debug/ I am now wondering if
would be best to add a new sysctl debug taint flag. This way bug
reports with these stupid knobs can got to /dev/null inbox for bug
reports.

Masami, /proc/sys/debug/kprobes-optimization is debug. Would you be OK
to add the taint for it too?

Masoud, /proc/sys/debug/exception-trace seems to actually be enabled
by default, and its goal seems to be to enable disabling it. So I
don't think it would make sense to taint there.

So.. maybe we need something /proc/sys/taints/ or
/proc/sys/debug/taints/ so its *very* clear this is by no way ever
expected to be used in production.

May even be good to long term add a symlink for vm/drop_caches there
as well?

  Luis

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help