Re: [RFC][PATCH] memcg: remove PCG_ACCT_LRU.
From: KAMEZAWA Hiroyuki <hidden>
Date: 2011-12-07 01:49:10
Also in:
linux-mm
On Tue, 6 Dec 2011 15:50:33 -0800 (PST) Hugh Dickins [off-list ref] wrote:
On Tue, 6 Dec 2011, KAMEZAWA Hiroyuki wrote:quoted
On Mon, 5 Dec 2011 23:36:34 -0800 (PST) Hugh Dickins [off-list ref] wrote: Hmm, at first glance at the patch, it seems far complicated than I expectedRight, this is just a rollup of assorted changes, yet to be presented properly as an understandable series.quoted
and added much checks and hooks to lru path...Actually, I think it removes more than it adds; while trying not to increase the overhead of lookup_page_cgroup()s and locking.quoted
quoted
Okay, here it is: my usual mix of cleanup and functional changes. There's work by Ying and others in here - will apportion authorship more fairly when splitting. If you're looking through it at all, the place to start would be memcontrol.c's lock_page_lru_irqsave().Thank you. This seems inetersting patch. Hmm...what I think of now is.. In most case, pages are newly allocated and charged ,and then, added to LRU. pc->mem_cgroup never changes while pages are on LRU. I have a fix for corner cases as to do 1. lock lru 2. remove-page-from-lru 3. overwrite pc->mem_cgroup 4. add page to lru again 5. unlock lruThat is indeed the sequence which __mem_cgroup_commit_charge() follows after the patch. But it optimizes out the majority of cases when no such lru operations are needed (optimizations best presented in a separate patch), while being careful about the tricky case when the page is on lru_add_pvecs, and may get on to an lru at any moment. And since it uses a separate lock for each memcg-zone's set of lrus, must take care that both lock and lru in 4 and 5 are different from those in 1 and 2.
yes, after per-zone-per-memcg lock, Above sequence should take some care. With naive solution, 1. get lruvec-1 from target pc->mem_cgroup 2. get lruvec-2 from target memcg to be charged. 3. lock lruvec-x lock 4. lock lruvec-y lock (x and y order is determined by css_id ?) 5. remove from LRU. 6. overwrite pc->mem_cgroup 7. add page to lru again 8. unlock lruvec-y 9. unlokc lruvec-x Hm, maybe there are another clever way..
quoted
And blindly believe pc->mem_cgroup regardless of PCG_USED bit at LRU handling.That's right. The difficulty comes when Used is cleared while the page is off lru, or page removed from lru while Used is clear: once lock is dropped, we have no hold on the memcg, and must move to root lru lest the old memcg get deleted. The old Used + AcctLRU + pc->mem_cgroup puppetry used to achieve that quite cleverly; but in distributing zone lru_locks over memcgs, we went through a lot of crashes before we understood the subtlety of it; and in most places were just fighting the way it shifted underneath us. Now mem_cgroup_move_uncharged_to_root() makes the move explicit, in just a few places.quoted
Hm, per-zone-per-memcg lru locking is much easier if - we igonore PCG_USED bit at lru handlingI may or may not agree with you, depending on what you mean!
Ah, after my patch,
mem_cgroup_lru_add(zone, page) {
pc = lookup_page_cgroup(page);
memcg = pc->mem_cgroup;
lruvec = lruvec(memcg, zone)
update zone stat for memcg
}
Then, no flag check at handling lru.
quoted
- we never overwrite pc->mem_cgroup if the page is on LRU.That's not the way I was thinking of it, but I think that's what we're doing.
I do this by a new rule "If page may be on LRU at commit_charge, lru_lock should be held and PageLRU must be cleared."
quoted
- if page may be added to LRU by pagevec etc.. while we overwrite pc->mem_cgroup, we always take lru_lock. This is our corner case.Yes, the tricky case I mention above.quoted
isn't it ? I posted a series of patch. I'm glad if you give me a quick review.I haven't glanced yet, will do so after an hour or two.
I think Johannes's chages of removing page_cgroup->lru allows us various chances of optimization/simplification. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html