Re: [PATCH v2 1/3] mm: Fix dropped memcg from mem cgroup soft limit tree
From: Michal Hocko <hidden>
Date: 2021-02-18 19:45:58
Also in:
linux-mm, lkml
On Thu 18-02-21 10:30:20, Tim Chen wrote:
On 2/18/21 12:24 AM, Michal Hocko wrote:quoted
I have already acked this patch in the previous version along with Fixes tag. It seems that my review feedback has been completely ignored also for other patches in this series.Michal, My apology. Our mail system screwed up and there are some mail missing from our mail system that I completely missed your mail. Only saw them now after I looked into the lore.kernel.org.
I see. My apology for suspecting you from ignoring my review.
Responding to your comment:quoted
Have you observed this happening in the real life? I do agree that the threshold based updates of the tree is not ideal but the whole soft reclaim code is far from optimal. So why do we care only now? The feature is essentially dead and fine tuning it sounds like a step back to me.Yes, I did see the issue mentioned in patch 2 breaking soft limit reclaim for cgroup v1. There are still some of our customers using cgroup v1 so we will like to fix this if possible.
It would be great to see more details.
For patch 3 regarding the uncharge_batch, it is more of an observation that we should uncharge in batch of same node and not prompted by actual workload. Thinking more about this, the worst that could happen is we could have some entries in the soft limit tree that overestimate the memory used. The worst that could happen is a soft page reclaim on that cgroup. The overhead from extra memcg event update could be more than a soft page reclaim pass. So let's drop patch 3 for now.
I would still prefer to handle that in the soft limit reclaim path and check each memcg for the soft limit reclaim excess before the reclaim.
Let me know if you will like me to resend patch 1 with the fixes tag
for commit 4e41695356fb ("memory controller: soft limit reclaim on contention")
and if there are any changes I should make for patch 2.I will ack and suggest Fixes.
Thanks. Tim
-- Michal Hocko SUSE Labs