Thread (127 messages) 127 messages, 9 authors, 2012-10-08

Re: [PATCH v3 04/13] kmem accounting basic infrastructure

From: Glauber Costa <hidden>
Date: 2012-09-27 18:48:36
Also in: cgroups, lkml

On 09/27/2012 09:46 PM, Tejun Heo wrote:
Hello,

On Thu, Sep 27, 2012 at 06:57:59PM +0400, Glauber Costa wrote:
quoted
quoted
Because we're not even trying to actually solve the problem but just
dumping it to userland.  If dentry/inode usage is the only case we're
being worried about, there can be better ways to solve it or at least
we should strive for that.
Not only it is not the only case we care about, this is not even touched
in this series. (It is only touched in the next one). This one, for
instance, cares about the stack. The reason everything is being dumped
into "kmem", is precisely to make things simpler. I argue that at some
point it makes sense to draw a line, and "kmem" is a much better line
than any fine grained control - precisely because it is conceptually
easier to grasp.
Can you please give other examples of cases where this type of issue
exists (plenty of shared kernel data structure which is inherent to
the workload at hand)?  Until now, this has been the only example for
this type of issues.
Yes. the namespace related caches (*), all kinds of sockets and network
structures, other file system structures like file struct, vm areas, and
pretty much everything a full container does.

(*) we run full userspace, so we have namespaces + cgroups combination.
quoted
quoted
I think the cost isn't too prohibitive considering it's already using
memcg.  Charging / uncharging happens only as pages enter and leave
slab caches and the hot path overhead is essentially single
indirection.  Glauber's benchmark seemed pretty reasonable to me and I
don't yet think that warrants exposing this subtle tree of
configuration.
Only so we can get some numbers: the cost is really minor if this is all
disabled. It this is fully enable, it can get to some 2 or 3 %, which
may or may not be acceptable to an application. But for me this is not
even about cost, and that's why I haven't brought it up so far
It seems like Mel's concern is mostly based on performance overhead
concerns tho.
quoted
quoted
The part I nacked is enabling kmemcg on a populated cgroup and then
starting accounting from then without any apparent indication that any
past allocation hasn't been considered.  You end up with numbers which
nobody can't tell what they really mean and there's no mechanism to
guarantee any kind of ordering between populating the cgroup and
configuring it and there's *no* way to find out what happened
afterwards neither.  This is properly crazy and definitely deserves a
nack.
Mel suggestion of not allowing this to happen once the cgroup has tasks
takes care of this, and is something I thought of myself.
You mean Michal's?  It should also disallow switching if there are
children cgroups, right?
No, I meant Mel, quoting this:

"Further I would expect that an administrator would be aware of these
limitations and set kmem_accounting at cgroup creation time before any
processes start. Maybe that should be enforced but it's not a
fundamental problem."

But I guess it is pretty much the same thing Michal proposes, in essence.

Or IOW, if your concern is with the fact that charges may have happened
in the past before this is enabled, we can make sure this cannot happen
by disallowing the limit to be set if currently unset (value changes are
obviously fine) if you have children or any tasks already in the group.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help