Thread (127 messages) 127 messages, 9 authors, 2012-10-08

Re: [PATCH v3 06/13] memcg: kmem controller infrastructure

From: Michal Hocko <hidden>
Date: 2012-09-27 13:44:36
Also in: linux-mm, lkml

On Thu 27-09-12 15:31:57, Glauber Costa wrote:
On 09/26/2012 07:51 PM, Michal Hocko wrote:
quoted
On Tue 18-09-12 18:04:03, Glauber Costa wrote:
[...]
quoted
quoted
+	*_memcg = NULL;
+	rcu_read_lock();
+	p = rcu_dereference(current->mm->owner);
+	memcg = mem_cgroup_from_task(p);
mem_cgroup_from_task says it can return NULL. Do we care here? If not
then please put VM_BUG_ON(!memcg) here.
quoted
+	rcu_read_unlock();
+
+	if (!memcg_can_account_kmem(memcg))
+		return true;
+
+	mem_cgroup_get(memcg);
I am confused. Why do we take a reference to memcg rather than css_get
here? Ahh it is because we keep the reference while the page is
allocated, right? Comment please.
ok.
quoted
I am still not sure whether we need css_get here as well. How do you
know that the current is not moved in parallel and it is a last task in
a group which then can go away?
the reference count aquired by mem_cgroup_get will still prevent the
memcg from going away, no?
Yes but you are outside of the rcu now and we usually do css_get before
we rcu_unlock. mem_cgroup_get just makes sure the group doesn't get
deallocated but it could be gone before you call it. Or I am just
confused - these 2 levels of ref counting is really not nice.

Anyway, I have just noticed that __mem_cgroup_try_charge does
VM_BUG_ON(css_is_removed(&memcg->css)) on a given memcg so you should
keep css ref count up as well.
quoted
quoted
+	/* The page allocation failed. Revert */
+	if (!page) {
+		memcg_uncharge_kmem(memcg, PAGE_SIZE << order);
+		return;
+	}
+
+	pc = lookup_page_cgroup(page);
+	lock_page_cgroup(pc);
+	pc->mem_cgroup = memcg;
+	SetPageCgroupUsed(pc);
+	unlock_page_cgroup(pc);
+}
+
+void __memcg_kmem_uncharge_page(struct page *page, int order)
+{
+	struct mem_cgroup *memcg = NULL;
+	struct page_cgroup *pc;
+
+
+	pc = lookup_page_cgroup(page);
+	/*
+	 * Fast unlocked return. Theoretically might have changed, have to
+	 * check again after locking.
+	 */
+	if (!PageCgroupUsed(pc))
+		return;
+
+	lock_page_cgroup(pc);
+	if (PageCgroupUsed(pc)) {
+		memcg = pc->mem_cgroup;
+		ClearPageCgroupUsed(pc);
+	}
+	unlock_page_cgroup(pc);
+
+	/*
+	 * Checking if kmem accounted is enabled won't work for uncharge, since
+	 * it is possible that the user enabled kmem tracking, allocated, and
+	 * then disabled it again.
disabling cannot happen, right?
not anymore, right. I can update the comment, 
yes, it is confusing
but I still believe it is a lot saner to trust information in
page_cgroup.
I have no objections against that. PageCgroupUsed test and using
pc->mem_cgroup is fine.
quoted
quoted
+#ifdef CONFIG_MEMCG_KMEM
+int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
+{
+	struct res_counter *fail_res;
+	struct mem_cgroup *_memcg;
+	int ret;
+	bool may_oom;
+	bool nofail = false;
+
+	may_oom = (gfp & __GFP_WAIT) && (gfp & __GFP_FS) &&
+	    !(gfp & __GFP_NORETRY);
A comment please? Why __GFP_IO is not considered for example?
Actually, I believe testing for GFP_WAIT and !GFP_NORETRY would be enough.

The rationale here is, of course, under which circumstance would it be
valid to call the oom killer? Which is, if the allocation can wait, and
can retry.
Yes __GFP_WAIT is clear because memcg OOM can wait for arbitrary amount
of time (wait for userspace action on oom_control). __GFP_NORETRY
couldn't get to oom before because oom was excluded explicitely for THP
and migration didn't go through the charging path to reach the oom.
But I do agree that __GFP_NORETRY allocations shouldn't cause the OOM
because we should rather fail the allocation from kernel rather than
shoot something.

-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help