Thread (54 messages) 54 messages, 9 authors, 2018-07-19

Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

From: Martin Wilck <hidden>
Date: 2018-07-19 12:23:53

On Thu, 2018-07-19 at 12:45 +0200, Jan Kara wrote:
On Thu 19-07-18 11:39:18, Martin Wilck wrote:
quoted
bio_iov_iter_get_pages() returns only pages for a single non-empty
segment of the input iov_iter's iovec. This may be much less than
the number
of pages __blkdev_direct_IO_simple() is supposed to process. Call
bio_iov_iter_get_pages() repeatedly until either the requested
number
of bytes is reached, or bio.bi_io_vec is exhausted. If this is not
done,
short writes or reads may occur for direct synchronous IOs with
multiple
iovec slots (such as generated by writev()). In that case,
__generic_file_write_iter() falls back to buffered writes, which
has been observed to cause data corruption in certain workloads.

Note: if segments aren't page-aligned in the input iovec, this
patch may
result in multiple adjacent slots of the bi_io_vec array to
reference the same
page (the byte ranges are guaranteed to be disjunct if the
preceding patch is
applied). We haven't seen problems with that in our and the
customer's
tests. It'd be possible to detect this situation and merge
bi_io_vec slots
that refer to the same page, but I prefer to keep it simple for
now.

Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
simplified bdev direct-io")
Signed-off-by: Martin Wilck <redacted>
---
 fs/block_dev.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 0dd87aa..41643c4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -221,7 +221,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb,
struct iov_iter *iter,
 
 	ret = bio_iov_iter_get_pages(&bio, iter);
 	if (unlikely(ret))
-		return ret;
+		goto out;
+
+	while (ret == 0 &&
+	       bio.bi_vcnt < bio.bi_max_vecs &&
iov_iter_count(iter) > 0)
+		ret = bio_iov_iter_get_pages(&bio, iter);
+
I have two suggestions here (posting them now in public):

Condition bio.bi_vcnt < bio.bi_max_vecs should always be true - we
made
sure we have enough vecs for pages in iter. So I'd WARN if this isn't
true.
Yeah. I wanted to add that to the patch. Slipped through, somehow.
Sorry about that.
Secondly, I don't think it is good to discard error from
bio_iov_iter_get_pages() here and just submit partial IO. It will
again
lead to part of IO being done as direct and part attempted to be done
as
buffered. Also the "slow" direct IO path in __blkdev_direct_IO()
behaves
differently - it aborts and returns error if bio_iov_iter_get_pages()
ever
returned error. IMO we should do the same here.
Well, it aborts the loop, but then (in the sync case) it still waits
for the already submitted IOs to finish. Here, too, I'd find it more
logical to return the number of successfully transmitted bytes rather
than an error code. In the async case, the submitted bios are left in
place, and will probably sooner or later finish, changing iocb->ki_pos.

I'm actually not quite certain if that's correct. In the sync case, it
causes the already-performed IO to be done again, buffered. In the
async case, it it may even cause two IOs for the same range to be in
flight at the same time ... ?

Martin

-- 
Dr. Martin Wilck [off-list ref], Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help