Thread (36 messages) 36 messages, 5 authors, 2020-06-22

Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API

From: Johannes Weiner <hannes@cmpxchg.org>
Date: 2020-05-12 21:58:35
Also in: linux-mm, lkml
Subsystem: memory management, memory management - thp (transparent huge page), the rest · Maintainers: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Linus Torvalds

On Tue, May 12, 2020 at 10:38:54AM -0400, Qian Cai wrote:
quoted
On May 8, 2020, at 2:30 PM, Johannes Weiner [off-list ref] wrote:

With the page->mapping requirement gone from memcg, we can charge anon
and file-thp pages in one single step, right after they're allocated.

This removes two out of three API calls - especially the tricky commit
step that needed to happen at just the right time between when the
page is "set up" and when it's "published" - somewhat vague and fluid
concepts that varied by page type. All we need is a freshly allocated
page and a memcg context to charge.

v2: prevent double charges on pre-allocated hugepages in khugepaged

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <redacted>
---
include/linux/mm.h      |  4 +---
kernel/events/uprobes.c | 11 +++--------
mm/filemap.c            |  2 +-
mm/huge_memory.c        |  9 +++------
mm/khugepaged.c         | 35 ++++++++++-------------------------
mm/memory.c             | 36 ++++++++++--------------------------
mm/migrate.c            |  5 +----
mm/swapfile.c           |  6 +-----
mm/userfaultfd.c        |  5 +----
9 files changed, 31 insertions(+), 82 deletions(-)
[]
quoted
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
@@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
out_up_write:
	up_write(&mm->mmap_sem);
out_nolock:
+	if (*hpage)
+		mem_cgroup_uncharge(*hpage);
	trace_mm_collapse_huge_page(mm, isolated, result);
	return;
out:
-	mem_cgroup_cancel_charge(new_page, memcg);
	goto out_up_write;
}
[]

Some memory pressure will crash this new code. It looks like somewhat racy.

if (!page->mem_cgroup)

where page == NULL in mem_cgroup_uncharge().
Thanks for the report, sorry about the inconvenience.

Hm, the page is exclusive at this point, nobody else should be
touching it. After all, khugepaged might reuse the preallocated page
for another pmd if this one fails to collapse.

Looking at the code, I think it's page itself that's garbage, not
page->mem_cgroup changing. If you have CONFIG_NUMA and the allocation
fails, *hpage could contain an ERR_PTR instead of being NULL.

I think we need the following fixlet:
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f2e0a5e5cfbb..f6161e17da26 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1193,7 +1193,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 out_up_write:
 	up_write(&mm->mmap_sem);
 out_nolock:
-	if (*hpage)
+	if (!IS_ERR_OR_NULL(*hpage))
 		mem_cgroup_uncharge(*hpage);
 	trace_mm_collapse_huge_page(mm, isolated, result);
 	return;
@@ -1928,7 +1928,7 @@ static void collapse_file(struct mm_struct *mm,
 	unlock_page(new_page);
 out:
 	VM_BUG_ON(!list_empty(&pagelist));
-	if (*hpage)
+	if (!IS_ERR_OR_NULL(*hpage))
 		mem_cgroup_uncharge(*hpage);
 	/* TODO: tracepoints */
 }
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help