Re: [PATCH v5 3/5] mm: Make alloc_contig_range handle free hugetlb pages
From: Michal Hocko <mhocko@suse.com>
Date: 2021-03-17 14:24:00
Also in:
lkml
On Wed 17-03-21 12:12:49, Oscar Salvador wrote:
alloc_contig_range will fail if it ever sees a HugeTLB page within the range we are trying to allocate, even when that page is free and can be easily reallocated. This has proved to be problematic for some users of alloc_contic_range, e.g: CMA and virtio-mem, where those would fail the call even when those pages lay in ZONE_MOVABLE and are free. We can do better by trying to replace such page. Free hugepages are tricky to handle so as to no userspace application notices disruption, we need to replace the current free hugepage with a new one. In order to do that, a new function called alloc_and_dissolve_huge_page is introduced. This function will first try to get a new fresh hugepage, and if it succeeds, it will replace the old one in the free hugepage pool. All operations are being handled under hugetlb_lock, so no races are
Slightly confusing because allocation which is a part of the process is certainly not done under the lock. "The free page replacement is done under hugetlb_lock, so no external user of hugetlb will notice the change. There is one tricky case when page's refcount is 0 because it is in the process of being released. A mising PageHugeFreed bit will tell us that freeing is in flight so we retry after dropping the hugetlb_lock. The race window should be small and the next retry should make a forward progress.
possible. The only exception is when page's refcount is 0, but it still
has not been flagged as PageHugeFreed.
E.g, below scenario:
CPU0 CPU1
__free_huge_page() isolate_or_dissolve_huge_page
PageHuge() == T
alloc_and_dissolve_huge_page
alloc_fresh_huge_page()
spin_lock(hugetlb_lock)
// PageHuge() && !PageHugeFreed &&
// !PageCount()
spin_unlock(hugetlb_lock)
spin_lock(hugetlb_lock)
1) update_and_free_page
PageHuge() == F
__free_pages()
2) enqueue_huge_page
SetPageHugeFreed()
spin_unlock(&hugetlb_lock)
spin_lock(hugetlb_lock)
1) PageHuge() == F (freed by case#1 from CPU0)
2) PageHuge() == T
PageHugeFreed() == T
- proceed with replacing the page
In the case above we retry as the window race is quite small and we have high
chances to succeed next time.
With regard to the allocation, we restrict it to the node the page belongs
to with __GFP_THISNODE, meaning we do not fallback on other node's zones.
Note that gigantic hugetlb pages are fenced off since there is a cyclic
dependency between them and alloc_contig_range.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <redacted>
Acked-by: Michal Hocko <mhocko@suse.com>my ack still applies. -- Michal Hocko SUSE Labs