Thread (46 messages) 46 messages, 7 authors, 2021-09-13

Re: [git pull] iov_iter fixes

From: Jens Axboe <axboe@kernel.dk>
Date: 2021-09-09 21:20:00
Also in: linux-fsdevel

On 9/9/21 1:37 PM, Linus Torvalds wrote:
On Wed, Sep 8, 2021 at 9:24 PM Al Viro [off-list ref] wrote:
quoted
        Fixes for io-uring handling of iov_iter reexpands
Ugh.

I have pulled this, because I understand what it does and I agree it
fixes a bug, but it really feels very very hacky and wrong to me.

It really smells like io-uring is doing a "iov_iter_revert()" using a
number that it pulls incorrectly out of its arse.

So when io-uring does that

                iov_iter_revert(iter, io_size - iov_iter_count(iter));

what it *really* wants to do is just basically "iov_iter_reset(iter)".

And that's basically what that addition of that "iov_iter_reexpand()"
tries to effectively do.

Wouldn't it be better to have a function that does exactly that?
That might indeed be better. Alternatively, consumers that truncate
should expand. Part of the problem here is the inconsistency in how they
are consumed.
Alternatively (and I'm cc'ing Jens) is is not possible for the
io-uring code to know how many bytes it *actually* used, rather than
saying that "ok, the iter originally had X bytes, now it has Y bytes,
so it must have used X-Y bytes" which was actively wrong for the case
where something ended up truncating the IO for some reason.
Not sure how we'd do that, outside of stupid tricks like copy the
iov_iter before we pass it down. But that's obviously not going to be
very efficient. Hence we're left with having some way to reset/reexpand,
even in the presence of someone having done truncate on it.
Because I note that io-uring does that

        /* may have left rw->iter inconsistent on -EIOCBQUEUED */
        iov_iter_revert(&rw->iter, req->result - iov_iter_count(&rw->iter));

in io_resubmit_prep() too, and that you guys missed that it's the
exact same issue, and needs that exact same iov_iter_reexpand().
I think you're right on that one, there's no difference between that use
case and the other two...
That "req->result" is once again the *original* length, and the above
code once again mis-handles the case of "oh, the iov got truncated
because of some IO limit".

So I've pulled this, but I think it is

 (a) ugly nasty

 (b) incomplete and misses a case

and needs more thought. At the VERY least it needs that
iov_iter_reexpand() in io_resubmit_prep() too, I think.

I'd like the comments expanded too. In particular that

                /* some cases will consume bytes even on error returns */
That comment is from me, and it goes back a few years. IIRC, it was the
iomap or xfs code that I hit this with, but honestly I don't remember
all the details at this point. I can try and play with it and see if it
still reproduces.
really should expand on the "some cases" thing, and why such an error
isn't fatal buye should be retried asynchronously blindly like this?
That would certainly make it easier to handle, as we'd never need to
care at that point. Ideally, return 'bytes_consumed' or error. It might
have been a case of -EAGAIN after truncate, I'll have to dig a bit to
find it again. Outside of that error, we don't retry as there's no point
in doing so.
Because I think _that_ is part of the fundamental issue here - the
io_uring code tries to just blindly re-submit the whole thing, and it
does it very badly and actually incorrectly.

Or am I missing something?
I think the key point here is re-figuring out where the
consumption-on-error comes from. If it just ends up being a truncated
iov, that's all good and fine. If not, that feels like a bug somewhere
else that needs fixing.

-- 
Jens Axboe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help