Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

From: HORIGUCHI NAOYA (堀口　直也) <hidden>
Date: 2021-03-31 01:54:07
Also in: lkml

On Fri, Mar 26, 2021 at 03:22:49PM +0100, David Hildenbrand wrote:

On 26.03.21 15:09, David Hildenbrand wrote:

quoted

On 22.03.21 12:33, Aili Yao wrote:

quoted

When we do coredump for user process signal, this may be one SIGBUS signal
with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is
resulted from ECC memory fail like SRAR or SRAO, we expect the memory
recovery work is finished correctly, then the get_dump_page() will not
return the error page as its process pte is set invalid by
memory_failure().

But memory_failure() may fail, and the process's related pte may not be
correctly set invalid, for current code, we will return the poison page,
get it dumped, and then lead to system panic as its in kernel code.

So check the hwpoison status in get_dump_page(), and if TRUE, return NULL.

There maybe other scenario that is also better to check hwposion status
and not to panic, so make a wrapper for this check, Thanks to David's
suggestion([off-list ref]).

Link: https://lkml.kernel.org/r/20210319104437.6f30e80d@alex-virtual-machine
Signed-off-by: Aili Yao <redacted>
Cc: David Hildenbrand <redacted>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <redacted>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Kravetz <redacted>
Cc: Aili Yao <redacted>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
   mm/gup.c      |  4 ++++
   mm/internal.h | 20 ++++++++++++++++++++
   2 files changed, 24 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index e4c224c..6f7e1aa 100644
--- a/mm/gup.c
+++ b/mm/gup.c

@@ -1536,6 +1536,10 @@ struct page *get_dump_page(unsigned long addr)
   				      FOLL_FORCE | FOLL_DUMP | FOLL_GET);
   	if (locked)
   		mmap_read_unlock(mm);

Thinking again, wouldn't we get -EFAULT from __get_user_pages_locked()
when stumbling over a hwpoisoned page?

See __get_user_pages_locked()->__get_user_pages()->faultin_page():

handle_mm_fault()->vm_fault_to_errno(), which translates
VM_FAULT_HWPOISON to -EFAULT, unless FOLL_HWPOISON is set (-> -EHWPOISON)

?

We could get -EFAULT, but sometimes not (depends on how memory_failure() fails).

If we failed to unmap, the page table is not converted to hwpoison entry,
so __get_user_pages_locked() get the hwpoisoned page.

If we successfully unmapped but failed in truncate_error_page() for example,
the processes mapping the page would get -EFAULT as expected.  But even in
this case, other processes could reach the error page via page cache and
__get_user_pages_locked() for them could return the hwpoisoned page.

Or doesn't that happen as you describe "But memory_failure() may fail, and
the process's related pte may not be correctly set invalid" -- but why does
that happen?

Simply because memory_failure() doesn't handle some page types like ksm page
and zero page. Or maybe shmem thp also belongs to this class.

On a similar thought, should get_user_pages() never return a page that has
HWPoison set? E.g., check also for existing PTEs if the page is hwpoisoned?

Make sense to me. Maybe inserting hwpoison check into follow_page_pte() and
follow_huge_pmd() would work well.

Thanks,
Naoya Horiguchi

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help