Thread (4 messages) 4 messages, 2 authors, 2021-02-18

Re: [RFC] Hugepage collapse in process context

From: Michal Hocko <mhocko@suse.com>
Date: 2021-02-18 08:39:39
Also in: linux-mm

On Thu 18-02-21 08:11:13, Song Liu wrote:
quoted
On Feb 16, 2021, at 8:24 PM, David Rientjes [off-list ref] wrote:

Hi everybody,

Khugepaged is slow by default, it scans at most 4096 pages every 10s.  
That's normally fine as a system-wide setting, but some applications would 
benefit from a more aggressive approach (as long as they are willing to 
pay for it).

Instead of adding priorities for eligible ranges of memory to khugepaged, 
temporarily speeding khugepaged up for the whole system, or sharding its 
work for memory belonging to a certain process, one approach would be to 
allow userspace to induce hugepage collapse.

The benefit to this approach would be that this is done in process context 
so its cpu is charged to the process that is inducing the collapse.  
Khugepaged is not involved.

Idea was to allow userspace to induce hugepage collapse through the new 
process_madvise() call.  This allows us to collapse hugepages on behalf of 
current or another process for a vectored set of ranges.

This could be done through a new process_madvise() mode *or* it could be a 
flag to MADV_HUGEPAGE since process_madvise() allows for a flag parameter 
to be passed.  For example, MADV_F_SYNC.

When done, this madvise call would allocate a hugepage on the right node 
and attempt to do the collapse in process context just as khugepaged would 
otherwise do.
This is very interesting idea. One question, IIUC, the user process will 
block until all small pages in given ranges are collapsed into THPs.
Do you mean that PF would be blocked due to exclusive mmap_sem? Or is
there anything else oyu have in mind?
What 
would happen if the memory is so fragmented that we cannot allocate that 
many huge pages? Do we need some fail over mechanisms? 
IIRC khugepaged preallocates pages without holding any locks and I would
expect the same will be done for madvise as well.
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help