Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages... | linux-doc

[PATCH v16 0/9] Free some vmemmap pages of HugeTLB page · Muchun Song <hidden> · 2021-02-19
[PATCH v16 1/9] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c · Muchun Song <hidden> · 2021-02-19
[PATCH v16 3/9] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page · Muchun Song <hidden> · 2021-02-19
[PATCH v16 2/9] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP · Muchun Song <hidden> · 2021-02-19
[PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Muchun Song <hidden> · 2021-02-19
Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Michal Hocko <mhocko@suse.com> · 2021-02-19
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Muchun Song <hidden> · 2021-02-20
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Michal Hocko <mhocko@suse.com> · 2021-02-22
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Muchun Song <hidden> · 2021-02-22
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Oscar Salvador <osalvador@suse.de> · 2021-02-22
Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Mike Kravetz <hidden> · 2021-02-23
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Muchun Song <hidden> · 2021-02-23
Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Oscar Salvador <osalvador@suse.de> · 2021-02-23
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Muchun Song <hidden> · 2021-02-23
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Oscar Salvador <osalvador@suse.de> · 2021-02-23
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Oscar Salvador <osalvador@suse.de> · 2021-02-23
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Oscar Salvador <osalvador@suse.de> · 2021-02-23
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Muchun Song <hidden> · 2021-02-24
Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · Oscar Salvador <osalvador@suse.de> · 2021-02-24
[PATCH v16 5/9] mm: hugetlb: set the PageHWPoison to the raw error page · Muchun Song <hidden> · 2021-02-19
[PATCH v16 7/9] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate · Muchun Song <hidden> · 2021-02-19
[PATCH v16 8/9] mm: hugetlb: gather discrete indexes of tail page · Muchun Song <hidden> · 2021-02-19
[PATCH v16 6/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap · Muchun Song <hidden> · 2021-02-19
[PATCH v16 9/9] mm: hugetlb: optimize the code with the help of the compiler · Muchun Song <hidden> · 2021-02-19

Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

From: Muchun Song <hidden>
Date: 2021-02-24 03:49:31
Also in: linux-fsdevel, linux-mm, lkml

On Wed, Feb 24, 2021 at 6:32 AM Oscar Salvador [off-list ref] wrote:

On Tue, Feb 23, 2021 at 04:41:28PM +0100, Oscar Salvador wrote:

quoted

On Tue, Feb 23, 2021 at 11:50:05AM +0100, Oscar Salvador wrote:

quoted

CPU0:                           CPU1:
                                set_compound_page_dtor(HUGETLB_PAGE_DTOR);
memory_failure_hugetlb
  get_hwpoison_page
    __get_hwpoison_page
      get_page_unless_zero
                                put_page_testzero()

Maybe this can happen. But it is a very corner case. If we want to
deal with this. We can put_page_testzero() first and then
set_compound_page_dtor(HUGETLB_PAGE_DTOR).

I have to check further, but it looks like this could actually happen.
Handling this with VM_BUG_ON is wrong, because memory_failure/soft_offline are
entitled to increase the refcount of the page.

AFAICS,

 CPU0:                                    CPU1:
                                          set_compound_page_dtor(HUGETLB_PAGE_DTOR);
 memory_failure_hugetlb
   get_hwpoison_page
     __get_hwpoison_page
       get_page_unless_zero
                                          put_page_testzero()
        identify_page_state
         me_huge_page

I think we can reach me_huge_page with either refcount = 1 or refcount =2,
depending whether put_page_testzero has been issued.

For now, I would not re-enqueue the page if put_page_testzero == false.
I have to see how this can be handled gracefully.

I took a brief look.
It is not really your patch fault. Hugetlb <-> memory-failure synchronization is
a bit odd, it definitely needs improvment.

The thing is, we can have different scenarios here.
E.g: by the time we return from put_page_testzero, we might have refcount ==
0 and PageHWPoison, or refcount == 1 PageHWPoison.

The former will let a user get a page from the pool and get a sigbus
when it faults in the page, and the latter will be even more odd as we
will have a self-refcounted page in the free pool (and hwpoisoned).

I have been looking at the dequeue_huge_page_node_exact().
If a PageHWPoison huge page is in the free pool list, the page will
not be allocated to the user. The PageHWPoison huge page
will be skip in the dequeue_huge_page_node_exact().

quoted

As I said, it is not this patchset fault. I just made me realize this
problem.

I have to think some more about this.

I have been thinking more about this.
memory failure events can occur at any time, and we might not be in a
position where we can handle gracefully the error, meaning that the page
might end up in non desirable state.

E.g: we could flag the page right before enqueing it.

I still think that VM_BUG_ON should go, as the refcount can be perfectly
increased by memory-failure/soft_offline handlers, so BUGing there does
not make much sense.

Make sense. I will remove the VM_BUG_ON.

One think we could do is to check the state of the page we want to
retrieve from the free hugepage pool.
We should discard any HWpoisoned ones, and dissolve them.

The thing is, memory-failure/soft_offline should allocate a new hugepage
for the free pool, so keep the pool stable.
Something like [1].

Anyway, this is orthogonal to this patch, and something I will work on
soon.

[1] https://lore.kernel.org/linux-mm/20210222135137.25717-2-osalvador@suse.de/T/#u (local)

Thanks for your efforts on this.

--
Oscar Salvador
SUSE L3

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help