Thread (10 messages) 10 messages, 3 authors, 2013-01-04

Re: [RFC v4 0/3] Support volatile for anonymous range

From: Minchan Kim <minchan@kernel.org>
Date: 2012-12-20 01:34:59
Also in: lkml

On Tue, Dec 18, 2012 at 10:27:46AM -0800, Arun Sharma wrote:
On 12/17/12 10:47 PM, Minchan Kim wrote:
quoted
I hope more inputs from user-space allocator people and test patch
with their allocator because it might need design change of arena
management for getting real vaule.
jemalloc knows how to handle MADV_FREE on platforms that support it.
This looks similar (we'll need a SIGBUS handler that does the right
thing = zero the page + mark it as non-volatile in the common case).
Don't work because it's too late to mark it as non-volatile in signal
handler in case of malloc.

For example,
free(P1-P4) -> mvolatile(P1-P4) -> VM discard(P3) -> alloc(P1-P4) ->
use P1 -> VM discard(P1) -> use P3 -> SIGBUS -> mark nonvolatile ->
lost P1.

So, we should call mnovolatile before giving the free space to user.
All of this of course assumes that apps madvise the kernel through
APIs exposed by the malloc implementation - not via a raw syscall.

In other words, some new user space code needs to be written to test
Agreed. I might want to design new allocator with this system calls if
existing allocators cannot use this system calls efficiently because it
might need allocator's design change. MADV_FREE/MADV_DONTNEED isn't cheap
due to enumerating ptes/page descriptors in that range to mark something
so I guess allocator avoids frequent calling of the such advise system call
and even if they call it, they want to call the big range as batch.
Just my imagine.

But mvolatile/mnovolatile is cheaper so you can call it more frequently
with smaller range so VM could have easy-reclaimable pages easily.
Another benefit of the mvolatile is it can change the behavior when memory
pressure is severe where it can zap all pages like DONTNEED so it could
work very flexible.
The downside of that approach is that if we call it with small range,
it can increase the number of VMA so we might tune point for VMA size.
this out fully. Sounds feasible though.
Thanks!
 -Arun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help