Re: [PATCH] memcg: replace ss->id_lock with a rwlock
From: Ying Han <hidden>
Date: 2011-08-24 04:12:07
On Tue, Aug 23, 2011 at 9:10 PM, Ying Han [off-list ref] wrote:
On Fri, Aug 19, 2011 at 6:55 AM, Johannes Weiner [off-list ref]wrote:quoted
Hello Andrew, On Wed, Aug 10, 2011 at 11:20:33AM -0700, Andrew Bresticker wrote:quoted
While back-porting Johannes Weiner's patch "mm: memcg-aware globalreclaim"quoted
for an internal effort, we noticed a significant performance regression during page-reclaim heavy workloads due to high contention of thess->id_lock.quoted
This lock protects idr map, and serializes calls to idr_get_next() in css_get_next() (which is used during the memcg hierarchy walk). Since idr_get_next() is just doing a look up, we need only serialize it with respect to idr_remove()/idr_get_new(). By making the ss->id_lock a rwlock, contention is greatly reduced and performance improves. Tested: cat a 256m file from a ramdisk in a 128m container 50 times on each core (one file + container per core) in parallel on a NUMA machine. Result is the time for the test to complete in 1 of the containers. Both kernels included Johannes' memcg-aware global reclaim patches. Before rwlock patch: 1710.778s After rwlock patch: 152.227sThe reason why there is much more hierarchy walking going on is because there was actually a design bug in the hierarchy reclaim. The old code would pick one memcg and scan it at decreasing priority levels until SCAN_CLUSTER_MAX pages were reclaimed. For each memcg scanned with priority level 12, there were SWAP_CLUSTER_MAX pages reclaimed. My last revision would bail the whole hierarchy walk once it reclaimed SWAP_CLUSTER_MAX. Also, at the time, small memcgs were not force-scanned yet. So 128m containers would force the priority level to 10 before scanning anything at all (128M / pagesize >> priority), and then bail after one or two scanned memcgs. This means that for each SWAP_CLUSTER_MAX reclaimed pages there was a nr_of_containers * 2 overhead of just walking the hierarchy to no avail.Good point. To make it a bit clear, the revision which bails out the hierarchy_walk based on nr_reclaimed is that we are looking at right now.quoted
I changed this and removed the bail condition based on the number of reclaimed pages. Instead, the cycle ends when all reclaimers together made a full round-trip through the hierarchy. The more cgroups, the more likely that there are several tasks going into reclaim concurrently, it should be a reasonable share of work for each one.The number of reclaim invocations, thus the number of hierarchy walks,quoted
is back to sane levels again and the id_lock contention should be less of an issue.looking forward to see the change.quoted
Your patch still makes sense, but it's probably less urgent.I think the patch itself make senses regardless of the global reclaim change. It seems to be a optimization in general. --Ying