Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy
From: Ding Hui <hidden>
Date: 2021-05-07 01:46:54
Also in:
lkml
On 2021/5/6 15:30, HORIGUCHI NAOYA(堀口 直也) wrote:
On Thu, May 06, 2021 at 12:01:34PM +0800, Ding Hui wrote:quoted
On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote:quoted
On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote:quoted
On 21.04.21 04:04, Ding Hui wrote:quoted
Recently we found there is a lot MemFree left in /proc/meminfo after do a lot of pages soft offline. I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. After take_page_off_buddy, the page is no longer belong to buddy allocator, and will not be used any more, but we maybe missed accounting NR_FREE_PAGES in this situation. Signed-off-by: Ding Hui <redacted> --- mm/page_alloc.c | 1 + 1 file changed, 1 insertion(+)diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cfc72873961d..8d65b62784d8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c@@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); ret = true; break; }Should this use __mod_zone_freepage_state() instead?Yes, __mod_zone_freepage_state() looks better to me. And I think that maybe an additional __mod_zone_freepage_state() in unpoison_memory() is necessary to cancel the decrement. I thought of the following, but it doesn't build because get_pfnblock_migratetype() is available only in mm/page_alloc.c, so you might want to add a small exported routine in mm/page_alloc.c and let it called from unpoison_memory(). @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn) } if (!get_hwpoison_page(p, flags, 0)) { - if (TestClearPageHWPoison(p)) + if (TestClearPageHWPoison(p)) { + int migratetype = get_pfnblock_migratetype(p, pfn); + num_poisoned_pages_dec(); + __mod_zone_freepage_state(page_zone(p), 1, migratetype); + } unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", pfn, &unpoison_rs); return 0;I think there is another problem: In normal case, we keep the last refcount of the hwpoison page, so get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when call put_page.I think that take_page_off_buddy() should not be called for this case (the error page have remaining refcount). So it seems that no need to update NR_FREE_PAGES ?
Yes, take_page_off_buddy() only used for free pages, but we will call page_ref_inc() after that, on the other hand for in used pages, we increased the refcount by get_any_page(), so in both cases, the hwpoisoned pages have refcount great than zero. I think there is no need to update NR_FREE_PAGES explicitly in unpoison_memory(), the put_page() will help us to update NR_FREE_PAGES and put the page back to buddy.
quoted
At race condition, we maybe leak the page because we does not put it back to buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages, NR_FREE_PAGES is adjusted correctly. CPU0 CPU1 soft_offline_page soft_offline_free_page page_handle_poison take_page_off_buddy SetPageHWPoison unpoison_memory if (!get_hwpoison_page(p)) TestClearPageHWPoison num_poisoned_pages_dec __mod_zone_freepage_state return 0 /* miss put the page back to buddy */ page_ref_inc num_poisoned_pages_incThanks for checking this, unpoison_memory() is racy. Recently we are suggesting to introduce mf_mutex by [1]. Although this patch is not merged to mainline yet, but it could be used to prevent the above race too. [1] https://lore.kernel.org/linux-mm/20210427062953.2080293-2-nao.horiguchi@gmail.com/ (local)
I'll look forward to it, thanks.
quoted
How about do nothing and return -EBUSY (so the caller can retry) if unpoison a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed pages") does ? @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn) } if (!get_hwpoison_page(p, flags, 0)) { - if (TestClearPageHWPoison(p)) - num_poisoned_pages_dec(); - unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", + unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page %#lx\n", pfn, &unpoison_rs); - return 0; + return -EBUSY;Currently unpoison_memory() does not work as reverse operation of take_page_off_buddy() (it's simply broken), so implementing it at one time would be better. I'll take time to fix unpoison_memory().
Thanks for your work. Actually, I'm not sure about the exactly meaning of "broken", it seems that the basic function of unpoison_memory() is ok if not considered the race conditions. -- Thanks, - Ding Hui