Thread (20 messages) 20 messages, 3 authors, 2013-08-12

Re: [RFC 0/3] Add madvise(..., MADV_WILLWRITE)

From: Andy Lutomirski <luto@amacapital.net>
Date: 2013-08-07 17:02:59
Also in: linux-mm, lkml

On Wed, Aug 7, 2013 at 6:40 AM, Jan Kara [off-list ref] wrote:
On Mon 05-08-13 12:43:58, Andy Lutomirski wrote:
quoted
My application fallocates and mmaps (shared, writable) a lot (several
GB) of data at startup.  Those mappings are mlocked, and they live on
ext4.  The first write to any given page is slow because
ext4_da_get_block_prep can block.  This means that, to get decent
performance, I need to write something to all of these pages at
startup.  This, in turn, causes a giant IO storm as several GB of
zeros get pointlessly written to disk.

This series is an attempt to add madvise(..., MADV_WILLWRITE) to
signal to the kernel that I will eventually write to the referenced
pages.  It should cause any expensive operations that happen on the
first write to happen immediately, but it should not result in
dirtying the pages.

madvice(addr, len, MADV_WILLWRITE) returns the number of bytes that
the operation succeeded on or a negative error code if there was an
actual failure.  A return value of zero signifies that the kernel
doesn't know how to "willwrite" the range and that userspace should
implement a fallback.

For now, it only works on shared writable ext4 mappings.  Eventually
it should support other filesystems as well as private pages (it
should COW the pages but not cause swap IO) and anonymous pages (it
should COW the zero page if applicable).

The implementation leaves much to be desired.  In particular, it
generates dirty buffer heads on a clean page, and this scares me.

Thoughts?
  One question before I look at the patches: Why don't you use fallocate()
in your application? The functionality you require seems to be pretty
similar to it - writing to an already allocated block is usually quick.
I do use fallocate, and, IIRC, the problem was worse before I added
the fallocate call.

This could be argued to be a filesystem problem -- perhaps
page_mkwrite should never block.  I don't expect that to be fixed any
time soon (if ever).

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help