Thread (33 messages) 33 messages, 4 authors, 2021-02-19

Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages

From: Michal Hocko <mhocko@suse.com>
Date: 2021-02-19 09:58:11
Also in: lkml

On Fri 19-02-21 10:05:53, Oscar Salvador wrote:
On Thu, Feb 18, 2021 at 02:59:40PM +0100, Michal Hocko wrote:
quoted
quoted
It should be:

 allocate_a_new_page (new_page's refcount = 1)
 put_page(new_page); (new_page's refcount = 0)
 dissolve_old_page
  : if fail
     dissolve_new_page (we can dissolve it as refcount == 0)

I hope this clarifies it .
OK, I see the problem now. And your above solution is not really
optimal either. Your put_page would add the page to the pool and so it
could be used by somebody. One way around it would be either directly
manipulating reference count which is fugly or you can make it a
temporal page (alloc_migrate_huge_page) or maybe even better not special
case this here but rather allow migrating free hugetlb pages in the
migrate_page path.
I have been weighting up this option because it seemed the most clean way to
proceed. Having the hability to migrate free hugepages directly from migrate_page
would spare us this function.
But there is a problem. migrate_pages needs the pages to be on a list (by
page->lru). That is no problem for used pages, but for freehugepages we would
have to remove the page from hstate->hugepage_freelists, meaning that if userspace
comes along and tries to get a hugepage (a hugepage he previously allocated by
e.g: /proc/sys/.../nr_hugepages), it will fail.
Good point. I should have realized that.
 
So I am not really sure we can go this way. Unless we are willing to accept
that temporary userspace can get ENOMEM if it tries to use a hugepage, which
I do not think it is a good idea.
No, this is not acceptable.
Another way to go would be to make up for the free hugepages to be migrated and
allocate the same amount, but that starts to go down a rabbit hole.

I yet need to give it some more spins, but all in all, I think the easiest way
forward way might be to do something like:

alloc_and_dissolve_huge_page {

   new_page = alloc_fresh_huge_page(h, gfp_mask, nid, NULL, NULL);
   if (new_page) {
           /*
            * Put the page in the freelist hugepage pool.
            * We might race with someone coming by and grabbing the page,
            * but that is fine since we mark the page as Temporary in case
            * both old and new_page fail to be dissolved, so new_page
            * will be freed when its last reference is gone.
            */
           put_page(new_page);
      
           if (!dissolve_free_huge_page(page)) {
                   /*
                    * Old page could be dissolved.
                    */
                   ret = true;
           } else if (dissolve_free_huge_page(new_page)) {
                  /*
                   * Page might have been dissolved by admin by doing
                   * "echo 0 > /proc/../nr_hugepages". Check it before marking
                   * the page.
                   */
                  spin_lock(&hugetlb_lock);
                  /* Mark the page Temporary in case we fail to dissolve both
                   * the old page and new_page. It will be freed when the last
                   * user drops it.
                   */
                  if (PageHuge(new_page))
                          SetPageHugeTemporary(new_page);
                  spin_unlock(&hugetlb_lock);
           }
   }
OK, this should work but I am really wondering whether it wouldn't be
just simpler to replace the old page by a new one in the free list
directly. Or is there any reason we have to go through the generic
helpers path? I mean something like this

	new_page = alloc_fresh_huge_page();
	if (!new_page)
		goto fail;
	spin_lock(hugetlb_lock);
	if (!PageHuge(old_page)) {
		/* freed from under us, nothing to do */ 
		__update_and_free_page(new_page);
		goto unlock;
	}
	list_del(&old_page->lru);
	__update_and_free_page(old_page);
	__enqueue_huge_page(new_page);
unlock:
	spin_unlock(hugetlb_lock);

This will require to split update_and_free_page and enqueue_huge_page to
counters independent parts but that shouldn't be a big deal. But it will
also protect from any races. Not an act of beauty but seems less hackish
to me.
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help