Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with... | linux-doc

Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

From: Michal Hocko <mhocko@suse.com>
Date: 2021-02-19 14:13:17
Also in: linux-fsdevel, linux-mm, lkml

On Fri 19-02-21 18:49:49, Muchun Song wrote:

When we free a HugeTLB page to the buddy allocator, we should allocate
the vmemmap pages associated with it. But we may cannot allocate vmemmap
pages when the system is under memory pressure, in this case, we just
refuse to free the HugeTLB page instead of looping forever trying to
allocate the pages. This changes some behavior (list below) on some
corner cases.

 1) Failing to free a huge page triggered by the user (decrease nr_pages).

    Need try again later by the user.

 2) Failing to free a surplus huge page when freed by the application.

    Try again later when freeing a huge page next time.

This means that surplus pages can accumulate right? This should be
rather unlikely because one released huge page could then be reused for
normal allocations - including vmemmap. Unlucky timing might still end
up in the accumulation though. Not something critical though.

 3) Failing to dissolve a free huge page on ZONE_MOVABLE via
    offline_pages().

    This is a bit unfortunate if we have plenty of ZONE_MOVABLE memory
    but are low on kernel memory. For example, migration of huge pages
    would still work, however, dissolving the free page does not work.
    This is a corner cases. When the system is that much under memory
    pressure, offlining/unplug can be expected to fail.

Please mention that this is unfortunate because it prevents from the
memory offlining which shouldn't happen for movable zones. People
depending on the memory hotplug and movable zone should carefuly
consider whether savings on unmovable memory are worth losing their
hotplug functionality in some situations.

 4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via
    alloc_contig_range() - once we have that handling in place. Mainly
    affects CMA and virtio-mem.

What about hugetlb page poisoning on HW failure (resp. soft offlining)?

    Similar to 3). virito-mem will handle migration errors gracefully.
    CMA might be able to fallback on other free areas within the CMA
    region.

We do not want to use GFP_ATOMIC to allocate vmemmap pages. Because it
grants access to memory reserves and we do not think it is reasonable
to use memory reserves. We use GFP_KERNEL in alloc_huge_page_vmemmap().

This likely needs more context around. Maybe something like
"
Vmemmap pages are allocated from the page freeing context. In order for
those allocations to be not disruptive (e.g. trigger oom killer)
__GFP_NORETRY is used. hugetlb_lock is dropped for the allocation
because a non sleeping allocation would be too fragile and it could fail
too easily under memory pressure. GFP_ATOMIC or other modes to access
memory reserves is not used because we want to prevent consuming
reserves under heavy hugetlb freeing.
"

I haven't gone through the patch in a great detail yet, from a high
level POV it looks good although the counter changes and reshuffling
seems little wild. That requires a more detailed look I do not have time
for right now. Mike would be much better for that anywya ;)

I do not see any check for an atomic context in free_huge_page path. I
have suggested to replace in_task by in_atomic check (with a gotcha that
the later doesn't work without preempt_count but there is a work to
address that).
-- 
Michal Hocko
SUSE Labs

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help