Re: [PATCH V3] memcg, oom: provide more precise dump info while memcg oom happening
From: Michal Hocko <hidden>
Date: 2012-11-09 10:50:57
Also in:
linux-mm, lkml
On Fri 09-11-12 18:23:07, Sha Zhengju wrote:
On 11/09/2012 12:25 AM, Michal Hocko wrote:quoted
On Thu 08-11-12 23:52:47, Sha Zhengju wrote:
[...]
quoted
quoted
+ for (i = 0; i< MEM_CGROUP_STAT_NSTATS; i++) { + long long val = 0; + if (i == MEM_CGROUP_STAT_SWAP&& !do_swap_account) + continue; + for_each_mem_cgroup_tree(mi, memcg) + val += mem_cgroup_read_stat(mi, i); + printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val)); + } + + for (i = 0; i< NR_LRU_LISTS; i++) { + unsigned long long val = 0; + + for_each_mem_cgroup_tree(mi, memcg) + val += mem_cgroup_nr_lru_pages(mi, BIT(i)); + printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val)); + } + printk(KERN_CONT "\n");This is nice and simple I am just thinking whether it is enough. Say that you have a deeper hierarchy and the there is a safety limit in the its root A (limit) /|\ B C D |\ E F and we trigger an OOM on the A's limit. Now we know that something blew up but what it was we do not know. Wouldn't it be better to swap the for and for_each_mem_cgroup_tree loops? Then we would see the whole hierarchy and can potentially point at the group which doesn't behave. Memory cgroup stats for A/: ... Memory cgroup stats for A/B/: ... Memory cgroup stats for A/C/: ... Memory cgroup stats for A/D/: ... Memory cgroup stats for A/D/E/: ... Memory cgroup stats for A/D/F/: ... Would it still fit in with your use case? [...]We haven't used those complicate hierarchy yet, but it sounds a good suggestion. :) Hierarchy is a little complex to use from our experience, and the three cgroups involved in memcg oom can be different: memcg of invoker, killed task, memcg of going over limit.Suppose a process in B triggers oom and a victim in root A is selected to be killed, we may as well want to know memcg stats just local in A cgroup(excludes BCD). So besides hierarchy info, does it acceptable to also print the local root node stats which as I did in the V1 version(https://lkml.org/lkml/2012/7/30/179).
Ohh, I probably wasn't clear enough. I didn't suggest cumulative
numbers. Only per group. So it would be something like:
for_each_mem_cgroup_tree(mi, memcg) {
printk("Memory cgroup stats for %s", memcg_name);
for (i = 0; i< MEM_CGROUP_STAT_NSTATS; i++) {
if (i == MEM_CGROUP_STAT_SWAP&& !do_swap_account)
continue;
printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i],
K(mem_cgroup_read_stat(mi, i)));
}
for (i = 0; i< NR_LRU_LISTS; i++)
printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i],
K(mem_cgroup_nr_lru_pages(mi, BIT(i))));
printk(KERN_CONT"\n");
}
Another one I'm hesitating is numa stats, it seems the output is beginning to get more and more....
NUMA stats are basically per node - per zone LRU data and that the for(NR_LRU_LISTS) can be easily extended to cover that. -- Michal Hocko SUSE Labs