Re: [RFC PATCH 1/8] hugetlb: add per-hstate mutex to synchronize user adjustments
From: Mike Kravetz <hidden>
Date: 2021-03-22 16:59:07
Also in:
lkml
On 3/22/21 6:59 AM, Michal Hocko wrote:
On Fri 19-03-21 15:42:02, Mike Kravetz wrote:quoted
The number of hugetlb pages can be adjusted by writing to the sysps/proc files nr_hugepages, nr_hugepages_mempolicy or nr_overcommit_hugepages. There is nothing to prevent two concurrent modifications via these files. The underlying routine set_max_huge_pages() makes assumptions that only one occurrence is running at a time. Specifically, alloc_pool_huge_page uses a hstate specific variable without any synchronization.From the above it is not really clear whether the unsynchronized nature of set_max_huge_pages is really a problem or a mere annoynce. I suspect the later because counters are properly synchronized with the hugetlb_lock. It would be great to clarify that.
It is a problem and an annoyance. The problem is that alloc_pool_huge_page -> for_each_node_mask_to_alloc is called after dropping the hugetlb lock. for_each_node_mask_to_alloc uses the helper hstate_next_node_to_alloc which uses and modifies h->next_nid_to_alloc. Worst case would be two instances of set_max_huge_pages trying to allocate pages on different sets of nodes. Pages could get allocated on the wrong nodes. I really doubt this problem has ever been experienced in practice. However, when looking at the code in was a real annoyance. :) I'll update the commit message to be more clear. -- Mike Kravetz