Thread (40 messages) 40 messages, 4 authors, 2021-03-17

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

From: Aili Yao <hidden>
Date: 2021-03-04 04:21:43
Also in: linux-mm, lkml

Possibly related (same subject, not in this thread)

On Thu, 4 Mar 2021 10:16:53 +0800
Aili Yao [off-list ref] wrote:
On Wed, 3 Mar 2021 15:41:35 +0000
"Luck, Tony" [off-list ref] wrote:
quoted
quoted
For error address with sigbus, i think this is not an issue resulted by the patch i post, before my patch, the issue is already there.
I don't find a realizable way to get the correct address for same reason --- we don't know whether the page mapping is there or not when
we got to kill_me_maybe(), in some case, we may get it, but there are a lot of parallel issue need to consider, and if failed we have to fallback
to the error brach again, remaining current code may be an easy option;    
My RFC patch from yesterday removes the uncertainty about whether the page is there or not. After it walks the page
tables we know that the poison page isn't mapped (note that patch is RFC for a reason ... I'm 90% sure that it should
do a bit more that just clear the PRESENT bit).

So perhaps memory_failure() has queued a SIGBUS for this task, if so, we take it when we return from kill_me_maybe()
And when this happen, the process will receive an SIGBUS with AO level, is it proper as not an AR?
quoted
If not, we will return to user mode and re-execute the failing instruction ... but because the page is unmapped we will take a #PF  
Got this, I have some error thoughts here.

quoted
The x86 page fault handler will see that the page for this physical address is marked HWPOISON, and it will send the SIGBUS
(just like it does if the page had been removed by an earlier UCNA/SRAO error).  
if your methods works, should it be like this?

1582                         pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
1583                         if (PageHuge(page)) {
1584                                 hugetlb_count_sub(compound_nr(page), mm);
1585                                 set_huge_swap_pte_at(mm, address,
1586                                                      pvmw.pte, pteval,
1587                                                      vma_mmu_pagesize(vma));
1588                         } else {
1589                                 dec_mm_counter(mm, mm_counter(page));
1590                                 set_pte_at(mm, address, pvmw.pte, pteval);
1591                         }

the page fault check if it's a poison page using is_hwpoison_entry(),
And if it works, does we need some locking mechanism before we call walk_page_range();
if we lock, does we need to process the blocking interrupted error as other places will do?

-- 
Thanks!
Aili Yao
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help