[PATCH 6/9] mm: assert that get_page_unless_zero() callers hold the rcu lock

[PATCH 0/9] Use RCU to stabilize page counts · Michel Lespinasse <hidden> · 2011-08-19
[PATCH 1/9] mm: rcu read lock for getting reference on pages in migration_entry_wait() · Michel Lespinasse <hidden> · 2011-08-19
[PATCH 4/9] mm: use get_page in deactivate_page() · Michel Lespinasse <hidden> · 2011-08-19
[PATCH 6/9] mm: assert that get_page_unless_zero() callers hold the rcu lock · Michel Lespinasse <hidden> · 2011-08-19
[PATCH 5/9] kvm: use get_page instead of get_page_unless_zero · Michel Lespinasse <hidden> · 2011-08-19
[PATCH 8/9] mm: add API for setting a grace period cookie on compound pages · Michel Lespinasse <hidden> · 2011-08-19
[PATCH 7/9] rcu: rcu_get_gp_cookie() / rcu_gp_cookie_elapsed() stand-ins · Michel Lespinasse <hidden> · 2011-08-19
[PATCH 9/9] mm: make sure tail page counts are stable before splitting THP pages · Michel Lespinasse <hidden> · 2011-08-19
[PATCH 3/9] mm: rcu read lock when getting from tail to head page · Michel Lespinasse <hidden> · 2011-08-19
[PATCH 2/9] mm: avoid calling get_page_unless_zero() when charging cgroups · Michel Lespinasse <hidden> · 2011-08-19
Re: [PATCH 0/9] Use RCU to stabilize page counts · Michel Lespinasse <hidden> · 2011-08-19
[PATCH] thp: tail page refcounting fix · Andrea Arcangeli <hidden> · 2011-08-22
Re: [PATCH] thp: tail page refcounting fix · Andrea Arcangeli <hidden> · 2011-08-23
Re: [PATCH] thp: tail page refcounting fix · Minchan Kim <hidden> · 2011-08-23
Re: [PATCH] thp: tail page refcounting fix · Andrea Arcangeli <hidden> · 2011-08-23
Re: [PATCH] thp: tail page refcounting fix · Michel Lespinasse <hidden> · 2011-08-23
Re: [PATCH] thp: tail page refcounting fix · Andrea Arcangeli <hidden> · 2011-08-24
Re: [PATCH] thp: tail page refcounting fix · Andrea Arcangeli <hidden> · 2011-08-24
[PATCH] thp: tail page refcounting fix #2 · Andrea Arcangeli <hidden> · 2011-08-24
Re: [PATCH] thp: tail page refcounting fix #2 · Michel Lespinasse <hidden> · 2011-08-26
Re: [PATCH] thp: tail page refcounting fix #2 · Andrea Arcangeli <hidden> · 2011-08-26
[PATCH] thp: tail page refcounting fix #3 · Andrea Arcangeli <hidden> · 2011-08-26
Re: [PATCH] thp: tail page refcounting fix #3 · Michel Lespinasse <hidden> · 2011-08-27
[PATCH] thp: tail page refcounting fix #4 · Andrea Arcangeli <hidden> · 2011-08-27
Re: [PATCH] thp: tail page refcounting fix #4 · Minchan Kim <hidden> · 2011-08-29
[PATCH] thp: tail page refcounting fix #5 · Andrea Arcangeli <hidden> · 2011-09-01
Re: [PATCH] thp: tail page refcounting fix #5 · Michel Lespinasse <hidden> · 2011-09-01
Re: [PATCH] thp: tail page refcounting fix #5 · Andrew Morton <akpm@linux-foundation.org> · 2011-09-01
Re: [PATCH] thp: tail page refcounting fix #5 · Andi Kleen <hidden> · 2011-09-01
Re: [PATCH] thp: tail page refcounting fix #5 · Andrea Arcangeli <hidden> · 2011-09-02
Re: [PATCH] thp: tail page refcounting fix #5 · Andi Kleen <hidden> · 2011-09-02
Re: [PATCH] thp: tail page refcounting fix #5 · Andrew Morton <akpm@linux-foundation.org> · 2011-09-02
[PATCH] thp: tail page refcounting fix #6 · Andrea Arcangeli <hidden> · 2011-09-08
Re: [PATCH] thp: tail page refcounting fix #6 · Peter Zijlstra <peterz@infradead.org> · 2011-09-23
Re: [PATCH] thp: tail page refcounting fix #6 · Andrea Arcangeli <hidden> · 2011-09-30
[PATCH 4/4] powerpc: gup_hugepte() support THP based tail recounting · Andrea Arcangeli <hidden> · 2011-10-16
[PATCH 1/4] powerpc: remove superfluous PageTail checks on the pte gup_fast · Andrea Arcangeli <hidden> · 2011-10-16
thp: gup_fast ppc tail refcounting [was Re: [PATCH] thp: tail page refcounting fix #6] · Andrea Arcangeli <hidden> · 2011-10-16
[PATCH 2/4] powerpc: get_hugepte() don't put_page() the wrong page · Andrea Arcangeli <hidden> · 2011-10-16
[PATCH 3/4] powerpc: gup_hugepte() avoid to free the head page too many times · Andrea Arcangeli <hidden> · 2011-10-16
[PATCH 2/4] powerpc: get_hugepte() don't put_page() the wrong page · Andrea Arcangeli <hidden> · 2011-10-16
[PATCH 3/4] powerpc: gup_hugepte() avoid to free the head page too many times · Andrea Arcangeli <hidden> · 2011-10-16
[PATCH 1/4] powerpc: remove superfluous PageTail checks on the pte gup_fast · Andrea Arcangeli <hidden> · 2011-10-16
thp: gup_fast ppc tail refcounting [was Re: [PATCH] thp: tail page refcounting fix #6] · Andrea Arcangeli <hidden> · 2011-10-16
[PATCH 4/4] powerpc: gup_hugepte() support THP based tail recounting · Andrea Arcangeli <hidden> · 2011-10-16
thp: gup_fast s390/sparc tail refcounting [was Re: [PATCH] thp: tail page refcounting fix #6] · Andrea Arcangeli <hidden> · 2011-10-17
[PATCH 1/3] s390: gup_huge_pmd() support THP tail recounting · Andrea Arcangeli <hidden> · 2011-10-17
[PATCH 3/3] thp: share get_huge_page_tail() · Andrea Arcangeli <hidden> · 2011-10-17
[PATCH 2/3] sparc: gup_pte_range() support THP based tail recounting · Andrea Arcangeli <hidden> · 2011-10-17
Re: [PATCH 2/3] sparc: gup_pte_range() support THP based tail recounting · David Miller <davem@davemloft.net> · 2011-10-17
fix two more s390/sparc gup_fast bugs · Andrea Arcangeli <hidden> · 2011-10-17
[PATCH 1/2] s390: gup_huge_pmd() return 0 if pte changes · Andrea Arcangeli <hidden> · 2011-10-17
[PATCH 2/2] powerpc: gup_huge_pmd() return 0 if pte changes · Andrea Arcangeli <hidden> · 2011-10-17
Re: [PATCH] thp: tail page refcounting fix #4 · Michel Lespinasse <hidden> · 2011-08-29
Re: [PATCH] thp: tail page refcounting fix #4 · Andrea Arcangeli <hidden> · 2011-08-29
Re: [PATCH] thp: tail page refcounting fix #2 · Andrea Arcangeli <hidden> · 2011-08-26

STALE5360d

From: Michel Lespinasse <hidden>
Date: 2011-08-19 07:49:29
Also in: lkml
Subsystem: hwpoison memory failure handling, memory hot(un)plug, memory management, memory management - core, memory management - mglru (multi-gen lru), memory management - reclaim, page cache, the rest · Maintainers: Miaohe Lin, David Hildenbrand, Oscar Salvador, Andrew Morton, Johannes Weiner, Matthew Wilcox, Linus Torvalds

In order to guarantee that page counts are stable one rcu grace period
after page allocation, it is important that get_page_unless_zero
call sites follow the proper protocol and hold the rcu read lock from
the time they locate the desired page until they get a reference on it.

__isolate_lru_page() is exempted - it knows the page it's trying to get
a reference on can't get fully freed, as it is on LRU list and it holds
the zone LRU lock.

Other call sites in memory_hotplug.c, memory_failure.c and hwpoison-inject.c
are also exempted. It would be preferable if someone more familiar with
these features could determine if that's safe.

Signed-off-by: Michel Lespinasse <redacted>
---
 include/linux/mm.h      |   16 +++++++++++++++-
 include/linux/pagemap.h |    1 +
 mm/hwpoison-inject.c    |    2 +-
 mm/memory-failure.c     |    6 +++---
 mm/memory_hotplug.c     |    2 +-
 mm/vmscan.c             |    7 ++++++-
 6 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..9ff5f2d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h

@@ -275,13 +275,27 @@ static inline int put_page_testzero(struct page *page)
 	return atomic_dec_and_test(&page->_count);
 }
 
+static inline int __get_page_unless_zero(struct page *page)
+{
+	return atomic_inc_not_zero(&page->_count);
+}
+
 /*
  * Try to grab a ref unless the page has a refcount of zero, return false if
  * that is the case.
  */
 static inline int get_page_unless_zero(struct page *page)
 {
-	return atomic_inc_not_zero(&page->_count);
+	/*
+	 * See page_cache_get_speculative() comment in pagemap.h
+	 * Note that for page counts to be guaranteed stable one
+	 * RCU grace period after they've been allocated,
+	 * all get_page_unless_zero call sites have to participate
+	 * by taking an rcu read lock before locating the desired page
+	 * and until getting a reference on it.
+	 */
+	VM_BUG_ON(!rcu_read_lock_held());
+	return __get_page_unless_zero(page);
 }
 
 extern int page_is_ram(unsigned long pfn);

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 716875e..736f47d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h

@@ -131,6 +131,7 @@ void release_pages(struct page **pages, int nr, int cold);
  */
 static inline int page_cache_get_speculative(struct page *page)
 {
+	VM_BUG_ON(!rcu_read_lock_held());
 	VM_BUG_ON(in_interrupt());
 
 #if !defined(CONFIG_SMP) && defined(CONFIG_TREE_RCU)

diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index c7fc7fd..87e027b 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c

@@ -30,7 +30,7 @@ static int hwpoison_inject(void *data, u64 val)
 	/*
 	 * This implies unable to support free buddy pages.
 	 */
-	if (!get_page_unless_zero(hpage))
+	if (!__get_page_unless_zero(hpage))
 		return 0;
 
 	if (!PageLRU(p) && !PageHuge(p))

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 740c4f5..6fc0409 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c

@@ -1025,7 +1025,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 	 * that may make page_freeze_refs()/page_unfreeze_refs() mismatch.
 	 */
 	if (!(flags & MF_COUNT_INCREASED) &&
-		!get_page_unless_zero(hpage)) {
+		!__get_page_unless_zero(hpage)) {
 		if (is_free_buddy_page(p)) {
 			action_result(pfn, "free buddy", DELAYED);
 			return 0;

@@ -1210,7 +1210,7 @@ int unpoison_memory(unsigned long pfn)
 
 	nr_pages = 1 << compound_trans_order(page);
 
-	if (!get_page_unless_zero(page)) {
+	if (!__get_page_unless_zero(page)) {
 		/*
 		 * Since HWPoisoned hugepage should have non-zero refcount,
 		 * race between memory failure and unpoison seems to happen.

@@ -1289,7 +1289,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
 	 * When the target page is a free hugepage, just remove it
 	 * from free hugepage list.
 	 */
-	if (!get_page_unless_zero(compound_head(p))) {
+	if (!__get_page_unless_zero(compound_head(p))) {
 		if (PageHuge(p)) {
 			pr_info("get_any_page: %#lx free huge page\n", pfn);
 			ret = dequeue_hwpoisoned_huge_page(compound_head(p));

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c46887b..cf57dfc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c

@@ -710,7 +710,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
-		if (!get_page_unless_zero(page))
+		if (!__get_page_unless_zero(page))
 			continue;
 		/*
 		 * We can skip free pages. And we can only deal with pages on

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d036e59..4c167da 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c

@@ -1001,11 +1001,16 @@ int __isolate_lru_page(struct page *page, int mode, int file)
 
 	ret = -EBUSY;
 
-	if (likely(get_page_unless_zero(page))) {
+	if (likely(__get_page_unless_zero(page))) {
 		/*
 		 * Be careful not to clear PageLRU until after we're
 		 * sure the page is not being freed elsewhere -- the
 		 * page release code relies on it.
+		 *
+		 * We are able to use the __get_page_unless_zero() variant
+		 * because we know the page can't get fully freed before we
+		 * get the reference on it - as it is on LRU list and we
+		 * hold the zone LRU lock.
 		 */
 		ClearPageLRU(page);
 		ret = 0;

-- 
1.7.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help