Thread (57 messages) 57 messages, 7 authors, 2015-07-31

Re: [PATCH -mm v9 0/8] idle memory tracking

From: Michal Hocko <hidden>
Date: 2015-07-29 15:47:24
Also in: cgroups, linux-mm, lkml

On Wed 29-07-15 18:28:17, Vladimir Davydov wrote:
On Wed, Jul 29, 2015 at 04:26:19PM +0200, Michal Hocko wrote:
quoted
On Wed 29-07-15 16:59:07, Vladimir Davydov wrote:
quoted
On Wed, Jul 29, 2015 at 02:36:30PM +0200, Michal Hocko wrote:
quoted
On Sun 19-07-15 15:31:09, Vladimir Davydov wrote:
[...]
quoted
---- USER API ----

The user API consists of two new proc files:
I was thinking about this for a while. I dislike the interface.  It is
quite awkward to use - e.g. you have to read the full memory to check a
single memcg idleness. This might turn out being a problem especially on
large machines.
Yes, with this API estimating the wss of a single memory cgroup will
cost almost as much as doing this for the whole system.

Come to think of it, does anyone really need to estimate idleness of one
particular cgroup?
It is certainly interesting for setting the low limit.
Yes, but IMO there is no point in setting the low limit for one
particular cgroup w/o considering what's going on with the rest of the
system.
If you use the low limit for isolating an important load then you do not
have to care about the others that much. All you care about is to set
the reasonable protection level and let others to compete for the rest.

[...]
quoted
quoted
quoted
I would assume that most users are interested only in a single number
which tells the idleness of the system/memcg.
Yes, that's what I need it for - estimating containers' wss for setting
their limits accordingly.
So why don't we export the single per memcg and global knobs then?
This would have few advantages. First of all it would be much easier to
use, you wouldn't have to export memcg ids and finally the implementation
could be changed without any user visible changes (e.g. lru vs. pfn walks),
potential caching and who knows what. In other words. Michel had a
single number interface AFAIR, what was the primary reason to move away
from that API?
Because there is too much to be taken care of in the kernel with such an
approach and chances are high that it won't satisfy everyone. What
should the scan period be equal too?
No, just gather the data on the read request and let the userspace
to decide when/how often etc. If we are clever enough we can cache
the numbers and prevent from the walk. Write to the file and do the
mark_idle stuff.
Knob. How many kthreads do we want?
Knob. I want to keep history for last N intervals (this was a part of
Michel's implementation), what should N be equal to? Knob.
This all relates to the kernel thread implementation which I wasn't
suggesting. I was referring to Michel's work which might induce that.
I was merely referring to a single number output. Sorry about the
confusion.
I want to be
able to choose between an instant scan and a scan distributed in time.
Knob. I want to see stats for anon/locked/file/dirty memory separately,
Why is this useful for the memcg limits setting or the wss estimation? I
can imagine that a further drop down numbers might be interesting
from the debugging POV but I fail to see what kind of decisions from
userspace you would do based on them.

[...]
quoted
Yes this is really tricky with the current LRU implementation. I
was playing with some ideas (do some checkpoints on the way) but
none of them was really working out on a busy systems. But the LRU
implementation might change in the future.
It might. Then we could come up with a new /proc or /sys file which
would do the same as /proc/kpageidle, but on per LRU^w whatever-it-is
basis, and give people a choice which one to use.
This just leads to proc files count explosion we are seeing
already... Proc ended up in dump ground for different things which
didn't fit elsewhere and I am not very much happy about it to be honest.

[...]
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help