Re: [git pull] iov_iter fixes
From: Pavel Begunkov <asml.silence@gmail.com>
Date: 2021-09-09 22:58:21
Also in:
linux-fsdevel
On 9/9/21 11:54 PM, Pavel Begunkov wrote:
On 9/9/21 8:37 PM, Linus Torvalds wrote:quoted
On Wed, Sep 8, 2021 at 9:24 PM Al Viro [off-list ref] wrote:quoted
Fixes for io-uring handling of iov_iter reexpandsUgh. I have pulled this, because I understand what it does and I agree it fixes a bug, but it really feels very very hacky and wrong to me.Maybe was worded not too clearly, my apologies.quoted
It really smells like io-uring is doing a "iov_iter_revert()" using a number that it pulls incorrectly out of its arse.It's not invented by io_uring, filemap.c : generic_file_direct_[write,read]() do the same thing. Also, the block layer was not re-expanding before ~5.12, so it looks it was possible to trigger a similar thing without io_uring, but I haven't tried to reproduce. Was mentioned in the cover-letter.quoted
So when io-uring does that iov_iter_revert(iter, io_size - iov_iter_count(iter)); what it *really* wants to do is just basically "iov_iter_reset(iter)". And that's basically what that addition of that "iov_iter_reexpand()" tries to effectively do. Wouldn't it be better to have a function that does exactly that? Alternatively (and I'm cc'ing Jens) is is not possible for the io-uring code to know how many bytes it *actually* used, rather than saying that "ok, the iter originally had X bytes, now it has Y bytes, so it must have used X-Y bytes" which was actively wrong for the case where something ended up truncating the IO for some reason. Because I note that io-uring does that /* may have left rw->iter inconsistent on -EIOCBQUEUED */ iov_iter_revert(&rw->iter, req->result - iov_iter_count(&rw->iter)); in io_resubmit_prep() too, and that you guys missed that it's the exact same issue, and needs that exact same iov_iter_reexpand().Right. It was covered by v1-v2, which were failing requests with additional fallback in v2 [1], but I dropped in v3 [2] because there is a difference. Namely io_resubmit_prep() might be called deeply down the stack, e.g. in the block layer. It was intended to get fixed once the first part is merged, and I do believe that was the right approach, because there were certain communication delays. The first version was posted a month ago, but we missed the merged window. It appeared to me that if we get anything more complex
Dammit, apologies for the teared email. ... It was intended to get fixed once the first part is merged, and I do believe that was the right approach, because there were certain communication delays. The first version was posted a month ago, but we missed the merged window. It appeared to me that if anything more complex is posted, it would take another window to get it done.
[1] https://lkml.org/lkml/2021/8/12/620 [2] https://lkml.org/lkml/2021/8/23/285quoted
That "req->result" is once again the *original* length, and the above code once again mis-handles the case of "oh, the iov got truncated because of some IO limit". So I've pulled this, but I think it is (a) ugly nasty (b) incomplete and misses a case and needs more thought. At the VERY least it needs that iov_iter_reexpand() in io_resubmit_prep() too, I think. I'd like the comments expanded too. In particular that /* some cases will consume bytes even on error returns */ really should expand on the "some cases" thing, and why such an error isn't fatal buye should be retried asynchronously blindly like this? Because I think _that_ is part of the fundamental issue here - the io_uring code tries to just blindly re-submit the whole thing, and it does it very badly and actually incorrectly. Or am I missing something? Linus
-- Pavel Begunkov