Re: [RFC][PATCH] memcg: remove PCG_ACCT_LRU.
From: KAMEZAWA Hiroyuki <hidden>
Date: 2011-12-06 10:22:12
Also in:
linux-mm
On Mon, 5 Dec 2011 23:36:34 -0800 (PST) Hugh Dickins [off-list ref] wrote:
On Tue, 6 Dec 2011, KAMEZAWA Hiroyuki wrote:quoted
On Mon, 5 Dec 2011 16:13:06 -0800 (PST) Hugh Dickins [off-list ref] wrote:quoted
Ying and I found PageCgroupAcctLRU very hard to grasp, even despite the comments Hannes added to explain it.Now, I don't think it's difficult. It seems no file system codes add pages to LRU before add_to_page_cache() (I checked.) So, what we need to care is only swap-cache. In swap-cache path, we can do slow work.I've been reluctant to add more special code for SwapCache: it may or may not be a good idea. Hannes also noted a FUSE case which requires the before-commit-after handling swap was using (for memcg-zone lru locking we've merged them into commit).
I think we need a fix for FUSE. In past, FUSE/splice used add_to_page_cache() but not it uses replace_page_cache(). So, we need another care. (I posted a patch.)
quoted
quoted
In moving the LRU locking from zone to memcg, we needed to depend upon pc->mem_cgroup: that was difficult while the interpretation of pc->mem_cgroup depended upon two flags also; and very tricky when pages were liable to shift underneath you from one LRU to another, as flags came and went. So we already eliminated PageCgroupAcctLRU here.Okay, Hm, do you see performance improvement by moving locks ?I was expecting someone to ask that question! I'm not up-to-date on it, it's one of the things I have to get help to gather before sending in the patch series. I believe the answer is that we saw some improvement on some tests, but not so much as to make a hugely compelling case for the change. But by that time we'd invested a lot of testing in the memcg locking, and little in the original zone locking, so went with the memcg locking anyway. We'll get more results and hope to show a stronger case for it now. But our results will probably have to be based on in-house kernels, with a lot of the "infrastructure" mods already in place, to allow an easy build-time switch between zone locking and memcg locking. That won't be such a fair test if the "infrastructure" mods are themselves detrimental (I believe not). It would be better to compare, say, 3.2.0-next against 3.2.0-next plus our patches - but my own (quad) machines for testing upstream kernels won't be big enough to show much of interest. I'm rather hoping someone will be interested enough to try on something beefier.
Hmm, at first glance at the patch, it seems far complicated than I expected and added much checks and hooks to lru path...
quoted
quoted
However, I've hardly begun splitting the changes up into a series: had intended to do so last week, but day followed day... If you'd like to see the unpolished uncommented rollup, I can post that.please. Anyway, I'll post my own again as output even if I stop my work there.Okay, here it is: my usual mix of cleanup and functional changes. There's work by Ying and others in here - will apportion authorship more fairly when splitting. If you're looking through it at all, the place to start would be memcontrol.c's lock_page_lru_irqsave().
Thank you. This seems inetersting patch. Hmm...what I think of now is.. In most case, pages are newly allocated and charged ,and then, added to LRU. pc->mem_cgroup never changes while pages are on LRU. I have a fix for corner cases as to do 1. lock lru 2. remove-page-from-lru 3. overwrite pc->mem_cgroup 4. add page to lru again 5. unlock lru And blindly believe pc->mem_cgroup regardless of PCG_USED bit at LRU handling. Hm, per-zone-per-memcg lru locking is much easier if - we igonore PCG_USED bit at lru handling - we never overwrite pc->mem_cgroup if the page is on LRU. - if page may be added to LRU by pagevec etc.. while we overwrite pc->mem_cgroup, we always take lru_lock. This is our corner case. isn't it ? I posted a series of patch. I'm glad if you give me a quick review. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>