Re: [RFC REPOST] cgroup: removing css reference drain wait during cgroup removal
From: KAMEZAWA Hiroyuki <hidden>
Date: 2012-03-16 00:04:23
Also in:
cgroups, lkml
(2012/03/15 20:24), Glauber Costa wrote:
On 03/15/2012 04:16 AM, KAMEZAWA Hiroyuki wrote:quoted
(2012/03/14 18:46), Glauber Costa wrote:quoted
On 03/14/2012 04:28 AM, KAMEZAWA Hiroyuki wrote:quoted
IIUC, in general, even in the processes are in a tree, in major case of servers, their workloads are independent. I think FLAT mode is the dafault. 'heararchical' is a crazy thing which cannot be managed.Better pay attention to the current overall cgroups discussions being held by Tejun then. ([RFD] cgroup: about multiple hierarchies) The topic of whether of adapting all cgroups to be hierarchical by deafult is a recurring one. I personally think that it is not unachievable to make res_counters cheaper, therefore making this less of a problem.I thought of this a little yesterday. Current my idea is applying following rule for res_counter. 1. All res_counter is hierarchical. But behavior should be optimized. 2. If parent res_counter has UNLIMITED limit, 'usage' will not be propagated to its parent at _charge_.That doesn't seem to make much sense. If you are unlimited, but your parent is limited, he has a lot more interest to know about the charge than you do.
Sorry, I should write "If all ancestors are umlimited'. If parent is limited, the children should be treated as limited.
So the logic should rather be the opposite: Don't go around getting locks and all that if you are unlimited. Your parent might, though. I am trying to experiment a bit with billing to percpu counters for unlimited res_counters. But their inexact nature is giving me quite a headache.
Personally, I think percpu counter is not the best one. Yes, it will work but...
Because of its nature of error range, it has scalability problem. Considering
to have a tree like
/A/B/Guest0/tasks
Guest1/tasks
Guest2/tasks
Guest4/tasks
Guest5/tasks
......
percpu res_counter may work scarable in GuestX level but will conflict in level B.
And I don't want to think what happens in 256 cpu system. Error in B will be
very big.
Another idea is to borrow a resource from memcg to the tasks. i.e.having per-task
caching of charges. But it has two problems that draining unused resource is difficult
and precise usage is unknown.
IMHO, hard-limited resource counter itself may be a problem ;)
So, an idea, 'if all ancestors are unlimited, don't propagate charges.'
comes to my mind. With this, people use resource in FLAT (but has hierarchical cgroup
tree) will not see any performance problem.
quoted
3. If a res_counter has UNLIMITED limit, at reading usage, it must visit all children and returns a sum of them. Then, /cgroup/ memory/ (unlimited) libivirt/ (unlimited) qeumu/ (unlimited) guest/(limited) All dir can show hierarchical usage and the guest will not have any lock contention at runtime.If we are okay with summing it up at read time, we may as well keep everything in percpu counters at all times.
If all ancestors are unlimited, we don't need to propagate usage upwards at charging. If one of ancestors are limited, we need to propagate and check usage at charging.
quoted
By this 1. no runtime overhead if the parent has unlimited limit. 2. All res_counter can show aggregate resource usage of children. To do this 1. res_coutner should have children list by itself. Implementation problem - What should happens when a user set new limit to a res_counter which have childrens ? Shouldn't we allow it ? Or take all locks of children and update in atomic ?Well, increasing the limit should be always possible.
As for the kids, how about:
- ) Take their locks
- ) scan through them seeing if their usage is bellow the new allowance
-) if it is, then ok
-) if it is not, then try to reclaim (*). Fail if it is not possible.
(*) May be hard to implement, because we already have the res_counter
lock taken, and the code may get nasty. So maybe it is better just fail
if any of your kids usage is over the new allowance...Seems enough and seems worth to try.
quoted
- memory.use_hierarchy should be obsolete ?If we're going fully hierarchical, yes.
Another big problem is 'when' we should do this change.. Maybe this 'hierarchical' problem will be good topic in MM summit. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>