Re: [PATCH V7 2/2] mm: memcg detect no memcgs above softlimit under zone reclaim
From: Rik van Riel <hidden>
Date: 2012-08-06 22:54:49
On 08/06/2012 05:18 PM, Ying Han wrote:
On Mon, Aug 6, 2012 at 11:51 AM, Rik van Riel[off-list ref] wrote:quoted
On 08/06/2012 11:11 AM, Michal Hocko wrote:quoted
On Mon 06-08-12 10:27:25, Rik van Riel wrote:quoted
quoted
quoted
So you think we shouldn't do the full round over memcgs in shrink_zone a and rather do it oom way to pick up a victim and hammer it?Not hammer it too far. Only until its score ends up well below (25% lower?) than that of the second highest scoring list. That way all the lists get hammered a little bit, in turn.How do we provide the soft limit guarantee then? [...]The easiest way would be to find the top 2 or 3 scoring memcgs when we reclaim memory. After reclaiming some pages, recalculate the scores of just these top lists, and see if the list we started out with now has a lower score than the second one. Once we have reclaimed some from each of the 2 or 3 lists, we can go back and find the highest priority lists again.Sounds like quite a lot of calculation to pick which memcg to reclaim from, and I wonder if that is necessary at all. For most of the use cases, we don't need to pick the lowest score memcg to reclaim from first. My understanding is that if we can respect the (lru_size - softlimit) to calculate the nr_to_scan, that is good move from what we have today. If so, can we just still do the round-robin fashion in shrink_zone() and for each memcg, we calculate the nr_to_scan similar to get_scan_count() what have today but w/ the new formula. For memcg under its softlimit, we avoid reclaim pages unless no more pages can be reclaimed, and then we start reclaiming under the softlimit. That part can use the same logic depending on (softlimit - lru_size)
If we do the round robin, we will not know in advance whether or not there are memcgs over (or under) the softlimit. Another thing to consider is that the round robin code will always iterate over the cgroups AND try to reclaim a little from every one of them. The first version of my code will just iterate over them to pick the highest priority cgroups, and will then reclaim from the that (or those) groups. This is less work than what your code does right now. In the future, we can find a way to sort the cgroups (in a tree?), so we do not have to walk over all of them. Some workloads have thousands of cgroups on a system. Iterating over all of them is not going to scale, it will be an inconvenience when just calculating the priority, and has the potential to be a total disaster when doing a little bit of reclaim from every one of them. Lets look at this one step at a time. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>