Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
From: David Laight <hidden>
Date: 2026-06-05 12:19:48
Also in:
linux-fsdevel, linux-mm, linux-patches, lkml, netdev
On Fri, 5 Jun 2026 11:43:45 +0200 Stefan Metzmacher [off-list ref] wrote:
Hi Linus,quoted
quoted
Am I understanding correctly that this will completely break zerocopy sendfile?Very much, yes. And it's worth making it very very clear that ABSOLUTELY NONE of the recent big security bugs were in splice. They were all in the networking and crypto code that just didn't deal with shared data correctly. So in that sense, it's a bit sad to discuss castrating splice. But it's probably still the right thing to at least try. I've seen very impressive benchmark numbers over the years, but they've often smelled more like benchmarketing than actual real work. There's also a real possibility that a lot of the sendfile / splice advantage has little to do with zero-copy, and more to do with the cost of mapping and maintaining buffers in user space. If you are sending file data using plain reads and writes, it's not just the "copy from user space to socket data structures". There's also the cost of populating user space in the first place: page faults for mmap made *that* historical copy avoidance basically a fairy tale. And not using mmap means that you have the cost of double caching in the kernel _and_ user space etc. So sendfile() as a concept (whether you use combinations of splice() system calls or the sendfile system call itsefl) isn't necessarily only about the zero-copy, it's really also about avoiding the user space memory management.I don't think so. Ok, maybe for webservers just serving tiny html files, that's true. But for me with Samba it's really the copy_to/from_iter() that is the major factor.
Is that copy also doing the ip checksum? I really can't tell from the code (it does sometimes, even for tcp). But I can't help feeling that optimisation is well past its sell by date. -- David
We can use io_uring with IOSQE_ASYNC in order to offload the memcpy cpu wasting to different cores, but it's still wasting a lot of resources. For the case of filesystem => socket, we can use IORING_OP_SENDMSG_ZC and that at least removes the copy_from_iter() in the sendmsg path, but the IORING_OP_READV of buffers in the sizes up to 8MBytes is wasting cpu in copy_to_iter(). For the case with smbdirect and RDMA offload with 2x200GBit/s links changes from only ~33GBytes/s are used (and the server cpu even if using multiple cores) is the limit. Without the memcpy waste ~46GByte/s is easily reached and the limit is just the network link. Maybe another solution could be having a version of copy_to/from_iter that uses async_memcpy(), but didn't have the time to experiment with that yet. Maybe a new flag to preadv2/pwritev2 could control that, so that the application can decide what's better. But without an alternative please don't kill splice. A lot of people are frustrated because they bought hardware that is able to handle a lot of throughput, but e.g. with the default of smb over tcp they get no higher than 3.5GByte/s on a 100GBit/s link that's able to handle ~11GBytes/s. And io_uring and splice are a key factor to fix that. Thanks! metze