Re: [PATCH v12 00/10] iov_iter: Improve page extraction (pin or just list)
From: Jens Axboe <axboe@kernel.dk>
Date: 2023-02-07 18:49:38
Also in:
linux-fsdevel, linux-mm, lkml
On 2/7/23 10:12 AM, David Howells wrote:
Hi Jens, Al, Christoph,
Here are patches to provide support for extracting pages from an iov_iter
and to use this in the extraction functions in the block layer bio code.
The patches make the following changes:
(1) Change generic_file_splice_read() to load up an ITER_BVEC iterator
with sufficient pages and use that rather than using an ITER_PIPE.
This avoids a problem[2] when __iomap_dio_rw() calls iov_iter_revert()
to shorten an iterator when it races with truncation. The reversion
causes the pipe iterator to prematurely release the pages it was
retaining - despite the read still being in progress. This caused
memory corruption.
(2) Remove ITER_PIPE and its paraphernalia as generic_file_splice_read()
was the only user.
(3) Add a function, iov_iter_extract_pages() to replace
iov_iter_get_pages*() that gets refs, pins or just lists the pages as
appropriate to the iterator type.
Add a function, iov_iter_extract_will_pin() that will indicate from
the iterator type how the cleanup is to be performed, returning true
if the pages will need unpinning, false otherwise.
(4) Make the bio struct carry a pair of flags to indicate the cleanup
mode. BIO_NO_PAGE_REF is replaced with BIO_PAGE_REFFED (indicating
FOLL_GET was used) and BIO_PAGE_PINNED (indicating FOLL_PIN was used)
is added.
BIO_PAGE_REFFED will go away, but at the moment fs/direct-io.c sets it
and this series does not fully address that file.
(5) Add a function, bio_release_page(), to release a page appropriately to
the cleanup mode indicated by the BIO_PAGE_* flags.
(6) Make the iter-to-bio code use iov_iter_extract_pages() to retain the
pages appropriately and clean them up later.
(7) Fix bio_flagged() so that it doesn't prevent a gcc optimisation.I've updated the for-6.3/iov-extract branch and the for-next branch. This isn't done to bypass any review, just so we can get some more testing on this (and because the old one is known broken). -- Jens Axboe