Thread (45 messages) 45 messages, 4 authors, 2018-01-25

Re: [PATCH 1/2] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes

From: Andrey Ryabinin <hidden>
Date: 2017-12-21 09:57:14
Also in: linux-mm, lkml


On 12/20/2017 09:15 PM, Shakeel Butt wrote:
On Wed, Dec 20, 2017 at 3:34 AM, Michal Hocko [off-list ref] wrote:
quoted
On Wed 20-12-17 14:32:19, Andrey Ryabinin wrote:
quoted
On 12/20/2017 01:33 PM, Michal Hocko wrote:
quoted
On Wed 20-12-17 13:24:28, Andrey Ryabinin wrote:
quoted
mem_cgroup_resize_[memsw]_limit() tries to free only 32 (SWAP_CLUSTER_MAX)
pages on each iteration. This makes practically impossible to decrease
limit of memory cgroup. Tasks could easily allocate back 32 pages,
so we can't reduce memory usage, and once retry_count reaches zero we return
-EBUSY.

It's easy to reproduce the problem by running the following commands:

  mkdir /sys/fs/cgroup/memory/test
  echo $$ >> /sys/fs/cgroup/memory/test/tasks
  cat big_file > /dev/null &
  sleep 1 && echo $((100*1024*1024)) > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
  -bash: echo: write error: Device or resource busy

Instead of trying to free small amount of pages, it's much more
reasonable to free 'usage - limit' pages.
But that only makes the issue less probable. It doesn't fix it because
            if (curusage >= oldusage)
                    retry_count--;
can still be true because allocator might be faster than the reclaimer.
Wouldn't it be more reasonable to simply remove the retry count and keep
trying until interrupted or we manage to update the limit.
But does it makes sense to continue reclaiming even if reclaimer can't
make any progress? I'd say no. "Allocator is faster than reclaimer"
may be not the only reason for failed reclaim. E.g. we could try to
set limit lower than amount of mlock()ed memory in cgroup, retrying
reclaim would be just a waste of machine's resources.  Or we simply
don't have any swap, and anon > new_limit. Should be burn the cpu in
that case?
We can check the number of reclaimed pages and go EBUSY if it is 0.
quoted
quoted
Another option would be to commit the new limit and allow temporal overcommit
of the hard limit. New allocations and the limit update paths would
reclaim to the hard limit.
It sounds a bit fragile and tricky to me. I wouldn't go that way
without unless we have a very good reason for this.
I haven't explored this, to be honest, so there may be dragons that way.
I've just mentioned that option for completness.
We already do this for cgroup-v2's memory.max. So, I don't think it is
fragile or tricky.
It has a potential to break userspace expectation. Userspace might expect that lowering 
limit_in_bytes too much fails with EBUSY and doesn't trigger OOM killer.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help