Thread (38 messages) 38 messages, 5 authors, 2025-03-21

Re: [RFC -next 00/10] Add ZC notifications to splice and sendfile

From: Jens Axboe <axboe@kernel.dk>
Date: 2025-03-21 11:11:03
Also in: linux-api, linux-arch, linux-fsdevel, linux-kselftest, lkml

On 3/19/25 1:16 PM, Joe Damato wrote:
quoted
quoted
In general: it does seem a bit odd to me that there isn't a safe
sendfile syscall in Linux that uses existing completion notification
mechanisms.
Pretty natural, I think. sendfile(2) predates that by quite a bit, and
the last real change to sendfile was using splice underneath. Which I
did, and that was probably almost 20 years ago at this point...

I do think it makes sense to have a sendfile that's both fast and
efficient, and can be used sanely with buffer reuse without relying on
odd heuristics.
Just trying to tie this together in my head -- are you saying that
you think the kernel internals of sendfile could be changed in a
different way or that this a userland problem (and they should use
the io_uring wrapper you suggested above) ?
I'm saying that it of course makes sense to have a way to do sendfile
where you know when reuse is safe, and that we have an API that provides
that very nicely already without needing to add syscalls. If you used
io_uring for this, then the "tx is done, reuse is fine" notification is
just another notification, not anything special that needs new plumbing.
quoted
quoted
quoted
quoted
I would also argue that there are likely user apps out there that
use both sendmsg MSG_ZEROCOPY for certain writes (for data in
memory) and also use sendfile (for data on disk). One example would
be a reverse proxy that might write HTTP headers to clients via
sendmsg but transmit the response body with sendfile.

For those apps, the code to check the error queue already exists for
sendmsg + MSG_ZEROCOPY, so swapping in sendfile2 seems like an easy
way to ensure safe sendfile usage.
Sure that is certainly possible. I didn't say that wasn't the case,
rather that the error queue approach is a work-around in the first place
for not having some kind of async notification mechanism for when it's
free to reuse.
Of course, I certainly agree that the error queue is a work around.
But it works, app use it, and its fairly well known. I don't see any
reason, other than historical context, why sendmsg can use this
mechanism, splice can, but sendfile shouldn't?
My argument would be the same as for other features - if you can do it
simpler this other way, why not consider that? The end result would be
the same, you can do fast sendfile() with sane buffer reuse. But the
kernel side would be simpler, which is always a kernel main goal for
those of us that have to maintain it.

Just adding sendfile2() works in the sense that it's an easier drop in
replacement for an app, though the error queue side does mean it needs
to change anyway - it's not just replacing one syscall with another. And
if we want to be lazy, sure that's fine. I just don't think it's the
best way to do it when we literally have a mechanism that's designed for
this and works with reuse already with normal send zc (and receive side
too, in the next kernel).
It seems like you've answered the question I asked above and that
you are suggesting there might be a better and simpler sendfile2
kernel-side implementation that doesn't rely on splice internals at
all.

Am I following you? If so, I'll drop the sendfile2 stuff from this
series and stick with the splice changes only, if you are (at a high
level) OK with the idea of adding a flag for this to splice.

In the meantime, I'll take a few more reads through the iouring code
to see if I can work out how sendfile2 might be built on top of that
instead of splice in the kernel.
Heh I don't know how you jumped to that conclusion based on my feedback,
and seems like it's solidified through other replies. No I'm not saying
that the approach makes sense for the kernel, it makes some vague amount
of sense only on the premise of "oh but this is easy for applications as
they already know how to use sendfile(2)".

-- 
Jens Axboe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help