Thread (22 messages) 22 messages, 5 authors, 2021-07-29

Re: [PATCH v1 1/6] mm/hwpoison: mf_mutex for soft offline and unpoison

From: HORIGUCHI NAOYA (堀口 直也) <hidden>
Date: 2021-06-16 00:41:37

On Tue, Jun 15, 2021 at 08:42:23PM +0800, Miaohe Lin wrote:
...
quoted
@@ -1960,7 +1964,7 @@ int unpoison_memory(unsigned long pfn)
 	if (!PageHuge(page) && PageTransHuge(page)) {
 		unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
 				 pfn, &unpoison_rs);
-		return 0;
+		goto unlock_mutex;
 	}
 
Maybe it's more appropriate to start mutex_lock(&mf_mutex) here? I think these races start here.
Hi Miaohe,

Thank your for the review.

Consider that we put mutex_lock() here, and let's think about two concurrent
calls of unpoison_memory(), then these events could be processed like below:

    CPU 0                             CPU 1
    unpoison_memory
    check PageHWPoison // true
                                      unpoison_memory
                                      check PageHWPoison  // true
    mutex_lock
    get_hwpoison_page
    TestClearPageHWPoison
    put_page
    put_page // freed
    mutex_unlock

    // the unpoisoned page can be used for allocation

                                      mutex_lock
                                      get_hwpoison_page // succeeds
                                      ... // unpoison the !PageHWPoison page !?


So I thought that we had better do the prechecks in mf_mutex.  Maybe the 2nd
unpoison_memory() just get and put the page refcount by 1 even in this race,
so the impact is not so big, but I feel like avoiding "unpoison the
!PageHWPoison page" situation.

Does it make sense for you?

Thanks,
Naoya Horiguchi
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help