Re: [git pull] iov_iter fixes
From: Jens Axboe <axboe@kernel.dk>
Date: 2021-09-09 21:20:00
Also in:
linux-fsdevel
On 9/9/21 1:37 PM, Linus Torvalds wrote:
On Wed, Sep 8, 2021 at 9:24 PM Al Viro [off-list ref] wrote:quoted
Fixes for io-uring handling of iov_iter reexpandsUgh. I have pulled this, because I understand what it does and I agree it fixes a bug, but it really feels very very hacky and wrong to me. It really smells like io-uring is doing a "iov_iter_revert()" using a number that it pulls incorrectly out of its arse. So when io-uring does that iov_iter_revert(iter, io_size - iov_iter_count(iter)); what it *really* wants to do is just basically "iov_iter_reset(iter)". And that's basically what that addition of that "iov_iter_reexpand()" tries to effectively do. Wouldn't it be better to have a function that does exactly that?
That might indeed be better. Alternatively, consumers that truncate should expand. Part of the problem here is the inconsistency in how they are consumed.
Alternatively (and I'm cc'ing Jens) is is not possible for the io-uring code to know how many bytes it *actually* used, rather than saying that "ok, the iter originally had X bytes, now it has Y bytes, so it must have used X-Y bytes" which was actively wrong for the case where something ended up truncating the IO for some reason.
Not sure how we'd do that, outside of stupid tricks like copy the iov_iter before we pass it down. But that's obviously not going to be very efficient. Hence we're left with having some way to reset/reexpand, even in the presence of someone having done truncate on it.
Because I note that io-uring does that
/* may have left rw->iter inconsistent on -EIOCBQUEUED */
iov_iter_revert(&rw->iter, req->result - iov_iter_count(&rw->iter));
in io_resubmit_prep() too, and that you guys missed that it's the
exact same issue, and needs that exact same iov_iter_reexpand().I think you're right on that one, there's no difference between that use case and the other two...
That "req->result" is once again the *original* length, and the above
code once again mis-handles the case of "oh, the iov got truncated
because of some IO limit".
So I've pulled this, but I think it is
(a) ugly nasty
(b) incomplete and misses a case
and needs more thought. At the VERY least it needs that
iov_iter_reexpand() in io_resubmit_prep() too, I think.
I'd like the comments expanded too. In particular that
/* some cases will consume bytes even on error returns */That comment is from me, and it goes back a few years. IIRC, it was the iomap or xfs code that I hit this with, but honestly I don't remember all the details at this point. I can try and play with it and see if it still reproduces.
really should expand on the "some cases" thing, and why such an error isn't fatal buye should be retried asynchronously blindly like this?
That would certainly make it easier to handle, as we'd never need to care at that point. Ideally, return 'bytes_consumed' or error. It might have been a case of -EAGAIN after truncate, I'll have to dig a bit to find it again. Outside of that error, we don't retry as there's no point in doing so.
Because I think _that_ is part of the fundamental issue here - the io_uring code tries to just blindly re-submit the whole thing, and it does it very badly and actually incorrectly. Or am I missing something?
I think the key point here is re-figuring out where the consumption-on-error comes from. If it just ends up being a truncated iov, that's all good and fine. If not, that feels like a bug somewhere else that needs fixing. -- Jens Axboe