Re: [PATCHSET v3 0/3] Add ability to save/restore iov_iter state

From: Jens Axboe <axboe@kernel.dk>
Date: 2021-09-16 01:16:00
Also in: linux-fsdevel

On 9/15/21 4:42 PM, Jens Axboe wrote:

On 9/15/21 1:40 PM, Jens Axboe wrote:

quoted

On 9/15/21 1:26 PM, Linus Torvalds wrote:

quoted

On Wed, Sep 15, 2021 at 11:46 AM Jens Axboe [off-list ref] wrote:

quoted

   The usual tests
do end up hitting the -EAGAIN path quite easily for certain device
types, but not the short read/write.

No way to do something like "read in file to make sure it's cached,
then invalidate caches from position X with POSIX_FADV_DONTNEED, then
do a read that crosses that cached/uncached boundary"?

To at least verify that "partly synchronous, but partly punted to
async" case?

Or were you talking about some other situation?

No that covers some of it, and that happens naturally with buffered IO.
The typical case is -EAGAIN on the first try, then you get a partial
or all of it the next loop, and then done or continue. I tend to run
fio verification workloads for that, as you get all the flexibility
of fio with the data verification. And there are tests in there that run
DONTNEED in parallel with buffered IO, exactly to catch some of these
csaes. But they don't verify the data, generally.

In that sense buffered is a lot easier than O_DIRECT, as it's easier to
provoke these cases. And that does hit all the save/restore parts and
looping, and if you do it with registered buffers then you get to work
with bvec iter as well. O_DIRECT may get you -EAGAIN for low queue depth
devices, but it'll never do a short read/write after that. 

But that's not in the regressions tests. I'll write a test case
that can go with the liburing regressions for it.

OK I wrote one, quick'n dirty. It's written as a liburing test, which
means it can take no arguments (in which case it creates a 128MB file),
or it can take an argument and it'll use that argument as the file. We
fill the first 128MB of the file with known data, basically the offset
of the file. Then we read it back in any of the following ways:

1) Using non-vectored read
2) Using vectored read, segments that fit in UIO_FASTIOV
3) Using vectored read, segments larger than UIO_FASTIOV

This catches all the different cases for a read.

We do that with both buffered and O_DIRECT, and before each pass, we
randomly DONTNEED either the first, middle, or end part of each segment
in the read size.

I ran this on my laptop, and I found this:
axboe@p1 ~/gi/liburing (master)> test/file-verify                                0.100s
bad read 229376, read 3
Buffered novec test failed
axboe@p1 ~/gi/liburing (master)> test/file-verify                                0.213s
bad read 294912, read 0
Buffered novec test failed

which is because I'm running the iov_iter.2 stuff, and we're hitting
that double accounting issue that I mentioned in the cover letter for
this series. That's why the read return is larger than we ask for
(128K). Running it on the current branch passes:

[root@archlinux liburing]# for i in $(seq 10); do test/file-verify; done
[root@archlinux liburing]# 

(this is in my test vm that I run on the laptop for kernel testing,
hence the root and different hostname).

I will add this as a liburing regression test case. Probably needs a bit
of cleaning up first, it was just a quick prototype as I thought your
suggestion was a good one. Will probably change it to run at a higher
queue depth than just the 1 it does now.

Cleaned it up a bit, and added registered buffer support as well (which
is another variant over non-vectored reads) and queued IO support as
well:

https://git.kernel.dk/cgit/liburing/commit/?id=6ab387dab745aff2af760d9fed56a4154669edec

and it's now part of the regular testing. Here's my usual run:

Running test file-verify                                            3 sec
Running test file-verify /dev/nvme0n1p2                             3 sec
Running test file-verify /dev/nvme1n1p1                             3 sec
Running test file-verify /dev/sdc2                                  Test file-verify timed out (may not be a failure)
Running test file-verify /dev/dm-0                                  3 sec
Running test file-verify /data/file                                 3 sec

Note that the sdc2 timeout isn't a failure, it's just that emulation on
qemu is slow enough that it takes 1min20s to run and I time out tests
after 60s in the harness to prevent something stalling forever.

-- 
Jens Axboe

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help