Thread (22 messages) 22 messages, 5 authors, 2021-07-29

Re: [PATCH v1 1/6] mm/hwpoison: mf_mutex for soft offline and unpoison

From: Miaohe Lin <linmiaohe@huawei.com>
Date: 2021-06-16 03:14:38

On 2021/6/16 8:41, HORIGUCHI NAOYA(堀口 直也) wrote:
On Tue, Jun 15, 2021 at 08:42:23PM +0800, Miaohe Lin wrote:
...
quoted
quoted
@@ -1960,7 +1964,7 @@ int unpoison_memory(unsigned long pfn)
 	if (!PageHuge(page) && PageTransHuge(page)) {
 		unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
 				 pfn, &unpoison_rs);
-		return 0;
+		goto unlock_mutex;
 	}
 
Maybe it's more appropriate to start mutex_lock(&mf_mutex) here? I think these races start here.
Hi Miaohe,

Thank your for the review.
quoted
Consider that we put mutex_lock() here, and let's think about two concurrent
calls of unpoison_memory(), then these events could be processed like below:

    CPU 0                             CPU 1
    unpoison_memory
    check PageHWPoison // true
                                      unpoison_memory
                                      check PageHWPoison  // true
    mutex_lock
    get_hwpoison_page
    TestClearPageHWPoison
    put_page
    put_page // freed
    mutex_unlock

    // the unpoisoned page can be used for allocation

                                      mutex_lock
                                      get_hwpoison_page // succeeds
                                      ... // unpoison the !PageHWPoison page !?


So I thought that we had better do the prechecks in mf_mutex.  Maybe the 2nd
unpoison_memory() just get and put the page refcount by 1 even in this race,
so the impact is not so big, but I feel like avoiding "unpoison the
!PageHWPoison page" situation.

Does it make sense for you?
Yes, I missed the races inside unpoison_memory() itself.
Many thanks for your detailed explanation!
Thanks,
Naoya Horiguchi
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help