Re: [PATCH V13 00/19] block: support multi-page bvec
From: Jens Axboe <axboe@kernel.dk>
Date: 2019-01-15 03:44:59
Also in:
dm-devel, linux-bcache, linux-block, linux-btrfs, linux-ext4, linux-fsdevel, linux-mm, linux-xfs, lkml
On 1/11/19 4:01 AM, Ming Lei wrote:
Hi, This patchset brings multi-page bvec into block layer: 1) what is multi-page bvec? Multipage bvecs means that one 'struct bio_bvec' can hold multiple pages which are physically contiguous instead of one single page used in linux kernel for long time. 2) why is multi-page bvec introduced? Kent proposed the idea[1] first. As system's RAM becomes much bigger than before, and huge page, transparent huge page and memory compaction are widely used, it is a bit easy now to see physically contiguous pages from fs in I/O. On the other hand, from block layer's view, it isn't necessary to store intermediate pages into bvec, and it is enough to just store the physicallly contiguous 'segment' in each io vector. Also huge pages are being brought to filesystem and swap [2][6], we can do IO on a hugepage each time[3], which requires that one bio can transfer at least one huge page one time. Turns out it isn't flexiable to change BIO_MAX_PAGES simply[3][5]. Multipage bvec can fit in this case very well. As we saw, if CONFIG_THP_SWAP is enabled, BIO_MAX_PAGES can be configured as much bigger, such as 512, which requires at least two 4K pages for holding the bvec table. With multi-page bvec: - Inside block layer, both bio splitting and sg map can become more efficient than before by just traversing the physically contiguous 'segment' instead of each page. - segment handling in block layer can be improved much in future since it should be quite easy to convert multipage bvec into segment easily. For example, we might just store segment in each bvec directly in future. - bio size can be increased and it should improve some high-bandwidth IO case in theory[4]. - there is opportunity in future to improve memory footprint of bvecs. 3) how is multi-page bvec implemented in this patchset? Patch 1 ~ 4 parpares for supporting multi-page bvec. Patches 5 ~ 15 implement multipage bvec in block layer: - put all tricks into bvec/bio/rq iterators, and as far as drivers and fs use these standard iterators, they are happy with multipage bvec - introduce bio_for_each_bvec() to iterate over multipage bvec for splitting bio and mapping sg - keep current bio_for_each_segment*() to itereate over singlepage bvec and make sure current users won't be broken; especailly, convert to this new helper prototype in single patch 21 given it is bascially a mechanism conversion - deal with iomap & xfs's sub-pagesize io vec in patch 13 - enalbe multipage bvec in patch 14 Patch 16 redefines BIO_MAX_PAGES as 256. Patch 17 documents usages of bio iterator helpers. Patch 18~19 kills NO_SG_MERGE. These patches can be found in the following git tree: git: https://github.com/ming1/linux.git for-4.21-block-mp-bvec-V12 Lots of test(blktest, xfstests, ltp io, ...) have been run with this patchset, and not see regression. Thanks Christoph for reviewing the early version and providing very good suggestions, such as: introduce bio_init_with_vec_table(), remove another unnecessary helpers for cleanup and so on. Thanks Chritoph and Omar for reviewing V10/V11/V12, and provides lots of helpful comments.
Thanks for persisting in this endeavor, Ming, I've applied this for 5.1. -- Jens Axboe