Re: [patch] Revert "memcg: add memory.vmscan_stat"
From: Johannes Weiner <hidden>
Date: 2011-08-30 07:05:34
Also in:
lkml
On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
On Mon, 29 Aug 2011 17:51:13 +0200 Johannes Weiner [off-list ref] wrote:quoted
On Tue, Aug 09, 2011 at 08:33:45AM +0900, KAMEZAWA Hiroyuki wrote:quoted
On Mon, 8 Aug 2011 14:43:33 +0200 Johannes Weiner [off-list ref] wrote:quoted
On Fri, Jul 22, 2011 at 05:15:40PM +0900, KAMEZAWA Hiroyuki wrote:quoted
+When under_hierarchy is added in the tail, the number indicates the +total memcg scan of its children and itself.In your implementation, statistics are only accounted to the memcg triggering the limit and the respectively scanned memcgs. Consider the following setup: A / \ B C / D If D tries to charge but hits the limit of A, then B's hierarchy counters do not reflect the reclaim activity resulting in D.yes, as I expected.Andrew, with a flawed design, the author unwilling to fix it, and two NAKs, can we please revert this before the release?How about this ?
quoted hunk ↗ jump to hunk
@@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s spin_lock(&memcg->scanstat.lock); __mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec); spin_unlock(&memcg->scanstat.lock); - - memcg = rec->root; - spin_lock(&memcg->scanstat.lock); - __mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec); - spin_unlock(&memcg->scanstat.lock); + cgroup = memcg->css.cgroup; + do { + spin_lock(&memcg->scanstat.lock); + __mem_cgroup_record_scanstat( + memcg->scanstat.hierarchy_stats[context], rec); + spin_unlock(&memcg->scanstat.lock); + if (!cgroup->parent) + break; + cgroup = cgroup->parent; + memcg = mem_cgroup_from_cont(cgroup); + } while (memcg->use_hierarchy && memcg != rec->root);
Okay, so this looks correct, but it sums up all parents after each memcg scanned, which could have a performance impact. Usually, hierarchy statistics are only summed up when a user reads them. I don't get why this has to be done completely different from the way we usually do things, without any justification, whatsoever. Why do you want to pass a recording structure down the reclaim stack? Why not make it per-cpu counters that are only summed up, together with the hierarchy values, when someone is actually interested in them? With an interface like mem_cgroup_count_vm_event(), or maybe even an extension of that function? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>