Re: [PATCH V7 2/2] mm: memcg detect no memcgs above softlimit under zone reclaim
From: Michal Hocko <hidden>
Date: 2012-08-06 14:03:56
On Wed 01-08-12 16:10:32, Rik van Riel wrote:
On 08/01/2012 03:04 PM, Ying Han wrote:quoted
That is true. Hmm, then two things i can do: 1. for kswapd case, make sure not counting the root cgroup 2. or check nr_scanned. I like the nr_scanned which is telling us whether or not the reclaim ever make any attempt ?I am looking at a more advanced case of (3) right now. Once I have the basics working, I will send you a prototype (that applies on top of your patches) to play with. Basically, for every LRU in the system, we can keep track of 4 things: - reclaim_stat->recent_scanned - reclaim_stat->recent_rotated - reclaim_stat->recent_pressure - LRU size The first two represent the fraction of pages on the list that are actively used. The larger the fraction of recently used pages, the more valuable the cache is. The inverse of that can be used to show us how hard to reclaim this cache, compared to other caches (everything else being equal). The recent pressure can be used to keep track of how many pages we have scanned on each LRU list recently. Pressure is scaled with LRU size. This would be the basic formula to decide which LRU to reclaim from: recent_scanned LRU size score = -------------- * ---------------- recent_rotated recent_pressure In other words, the less the objects on an LRU are used, the more we should reclaim from that LRU. The larger an LRU is, the more we should reclaim from that LRU.
The formula makes sense but I am afraid that it will be hard to tune it into something that wouldn't regress. For example I have seen workloads which had many small groups which are used to wrap up backup jobs and those are scanned a lot, you would see also many rotations because of the writeback but those are definitely good to scan rather than a large group which needs to keep its data resident. Anyway, I am not saying the score approach is a bad idea but I am afraid it will be hard to validate and make it right.
The more we have already scanned an LRU, the lower its score becomes. At some point, another LRU will have the top score, and that will be the target to scan.
So you think we shouldn't do the full round over memcgs in shrink_zone a and rather do it oom way to pick up a victim and hammer it?
We can adjust the score for different LRUs in different ways, eg.: - swappiness adjustment for file vs anon LRUs, within an LRU set - if an LRU set contains a file LRU with more inactive than active pages, reclaim from this LRU set first - if an LRU set is over it's soft limit, reclaim from this LRU set first
maybe we could replace LRU size by (LRU size - soft_limit) in the above formula?
This also gives us a nice way to balance memory pressure between zones, etc...
-- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>