Thread (21 messages) 21 messages, 6 authors, 2015-04-04

Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)

From: Milosz Tanski <hidden>
Date: 2015-03-27 16:38:48
Also in: linux-arch, linux-fsdevel, lkml

Possibly related (same subject, not in this thread)

On Fri, Mar 27, 2015 at 11:58 AM, Jeremy Allison [off-list ref] wrote:
On Fri, Mar 27, 2015 at 02:01:59AM -0700, Andrew Morton wrote:
quoted
On Fri, 27 Mar 2015 01:48:33 -0700 Christoph Hellwig [off-list ref] wrote:
quoted
On Fri, Mar 27, 2015 at 01:35:16AM -0700, Andrew Morton wrote:
quoted
fincore() doesn't have to be ugly.  Please address the design issues I
raised.  How is pread2() useful to the class of applications which
cannot proceed until all data is available?
It actually makes them work correctly?  preadv2( ..., DONTWAIT) will
return -EGAIN, which causes them to bounce to the threadpool where
they call preadv(...).
(I assume you mean RWF_NONBLOCK)

That isn't how pread2() works.  If the leading one or more pages are
uptodate, pread2() will return a partial read.  Now what?  Either the
application reads the same data a second time via the worker thread
(dumb, but it will usually be a rare case)
The problem with the above is that we can't tell the difference
between pread2() returning a short read because the pages are not
in cache, or because someone truncated the file. So we need some
way to differentiate this.

My preference from userspace would be for pread2() to return
EAGAIN if *all* the data requested is not available (where
'all' can be less than the size requested if the file has
been truncated in the meantime).

So:

ret = pread2(fd, buf, size_wanted, RWF_NONBLOCK)

if (ret == -1) {
        if (errno == EAGAIN) {
                goto threadpool...
        }
        .. real error..
}

if (ret == size_wanted) {
        .. normal read, file not truncated...
}

if (ret < size_wanted) {
        .. file was truncated..
}

The thing I want to avoid is the case where
ret < size_wanted means only part of the file
is in cache.
I very much like the short read behavior. It lets you overlap some CPU
work partial data (like TLS and then sticking it network output
buffer) with waiting for the test of the data (enequed in the thread
pool).

Short reads are the current behavior, if you call preadv2 a second
time around at EOF it'll return 0 instead of EWOULDBLOCK today. I
actually test for this in the preadv2 test in xfstest here:
https://github.com/mtanski/xfstests/commit/688db24c292999c81ee17caf2b61fe8cf7bb3cd6#diff-114416ea98ce29dde3b5b3d145afbd2bR81.

There's one caveat, that it's possible to get EWOULDBLOCK when reading
at end of file if the file metadata is not paged in.

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@adfin.com

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help