Re: [PATCH v5 02/10] block: Add copy offload support infrastructure
From: Ming Lei <hidden>
Date: 2022-11-24 00:05:20
Also in:
dm-devel, linux-fsdevel, linux-nvme, lkml
On Wed, Nov 23, 2022 at 03:37:12PM +0530, Nitesh Shetty wrote:
On Wed, Nov 23, 2022 at 04:04:18PM +0800, Ming Lei wrote:quoted
On Wed, Nov 23, 2022 at 11:28:19AM +0530, Nitesh Shetty wrote:quoted
Introduce blkdev_issue_copy which supports source and destination bdevs, and an array of (source, destination and copy length) tuples. Introduce REQ_COPY copy offload operation flag. Create a read-write bio pair with a token as payload and submitted to the device in order. Read request populates token with source specific information which is then passed with write request. This design is courtesy Mikulas Patocka's token based copyI thought this patchset is just for enabling copy command which is supported by hardware. But turns out it isn't, because blk_copy_offload() still submits read/write bios for doing the copy. I am just wondering why not let copy_file_range() cover this kind of copy, and the framework has been there.Main goal was to enable copy command, but community suggested to add copy emulation as well. blk_copy_offload - actually issues copy command in driver layer. The way read/write BIOs are percieved is different for copy offload. In copy offload we check REQ_COPY flag in NVMe driver layer to issue copy command. But we did missed it to add in other driver's, where they might be treated as normal READ/WRITE. blk_copy_emulate - is used if we fail or if device doesn't support native copy offload command. Here we do READ/WRITE. Using copy_file_range for emulation might be possible, but we see 2 issues here. 1. We explored possibility of pulling dm-kcopyd to block layer so that we can readily use it. But we found it had many dependecies from dm-layer. So later dropped that idea.
Is it just because dm-kcopyd supports async copy? If yes, I believe we can reply on io_uring for implementing async copy_file_range, which will be generic interface for async copy, and could get better perf.
2. copy_file_range, for block device atleast we saw few check's which fail it for raw block device. At this point I dont know much about the history of why such check is present.
Got it, but IMO the check in generic_copy_file_checks() can be relaxed to cover blkdev cause splice does support blkdev. Then your bdev offload copy work can be simplified into: 1) implement .copy_file_range for def_blk_fops, suppose it is blkdev_copy_file_range() 2) inside blkdev_copy_file_range() - if the bdev supports offload copy, just submit one bio to the device, and this will be converted to one pt req to device - otherwise, fallback to generic_copy_file_range()
quoted
When I was researching pipe/splice code for supporting ublk zero copy[1], I have got idea for async copy_file_range(), such as: io uring based direct splice, user backed intermediate buffer, still zero copy, if these ideas are finally implemented, we could get super-fast generic offload copy, and bdev copy is really covered too. [1] https://lore.kernel.org/linux-block/20221103085004.1029763-1-ming.lei@redhat.com/ (local)Seems interesting, We will take a look into this.
BTW, that is probably one direction of ublk's async zero copy IO too. Thanks, Ming