Re: [PATCH v1 1/6] mm/hwpoison: mf_mutex for soft offline and unpoison
From: HORIGUCHI NAOYA (堀口 直也) <hidden>
Date: 2021-06-16 00:41:37
On Tue, Jun 15, 2021 at 08:42:23PM +0800, Miaohe Lin wrote: ...
quoted
@@ -1960,7 +1964,7 @@ int unpoison_memory(unsigned long pfn) if (!PageHuge(page) && PageTransHuge(page)) { unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n", pfn, &unpoison_rs); - return 0; + goto unlock_mutex; }Maybe it's more appropriate to start mutex_lock(&mf_mutex) here? I think these races start here.
Hi Miaohe,
Thank your for the review.
Consider that we put mutex_lock() here, and let's think about two concurrent
calls of unpoison_memory(), then these events could be processed like below:
CPU 0 CPU 1
unpoison_memory
check PageHWPoison // true
unpoison_memory
check PageHWPoison // true
mutex_lock
get_hwpoison_page
TestClearPageHWPoison
put_page
put_page // freed
mutex_unlock
// the unpoisoned page can be used for allocation
mutex_lock
get_hwpoison_page // succeeds
... // unpoison the !PageHWPoison page !?
So I thought that we had better do the prechecks in mf_mutex. Maybe the 2nd
unpoison_memory() just get and put the page refcount by 1 even in this race,
so the impact is not so big, but I feel like avoiding "unpoison the
!PageHWPoison page" situation.
Does it make sense for you?
Thanks,
Naoya Horiguchi