Thread (39 messages) 39 messages, 7 authors, 2025-02-20

Re: [PATCH net-next v3 5/6] net: devmem: Implement TX path

From: Mina Almasry <hidden>
Date: 2025-02-20 01:46:50
Also in: kvm, linux-doc, linux-kselftest, lkml, virtualization

On Wed, Feb 19, 2025 at 2:40 PM Pavel Begunkov [off-list ref] wrote:
On 2/17/25 23:26, Mina Almasry wrote:
quoted
On Thu, Feb 13, 2025 at 5:17 AM Pavel Begunkov [off-list ref] wrote:
...
quoted
quoted
quoted
quoted
quoted
It's asserting that sizeof(ubuf_info_msgzc) <= sizeof(skb->cb), and
I'm guessing increasing skb->cb size is not really the way to go.

What I may be able to do here is stash the binding somewhere in
ubuf_info_msgzc via union with fields we don't need for devmem, and/or
It doesn't need to account the memory against the user, and you
actually don't want that because dmabuf should take care of that.
So, it should be fine to reuse ->mmp.

It's also not a real sk_buff, so maybe maintainers wouldn't mind
reusing some more space out of it, if that would even be needed.
netmem skb are real sk_buff, with the modification that frags are not
We were discussing ubuf_info allocation, take a look at
msg_zerocopy_alloc(), it has nothing to do with netmems and all that.
Yes. My response was regarding the suggestion that we can use space in
devmem skbs however we want though.
Well, at least I didn't suggest that, assuming "devmem skbs" are skbs
filled with devmem frags. I think the confusion here is thinking
that skb->cb you mentioned above is about "devmem skbs", while it's
special skbs without data used only to piggy back ubuf allocation.
Ah, I see. I still don't see how we can just increase the size of
skb->cb when it's shared between these special skbs and regular skbs.
Functionally speaking, it'd be perfectly fine to get rid of the
warning and allocate it with kmalloc().
More suggestions to refactor unrelated things to force through a
msg->sg_from_iter approach.
...
quoted
quoted
quoted
But MSG_ZEROCOPY doesn't set msg->msg_ubuf. And not setting
msg->msg_ubuf fails to trigger msg->sg_from_iter altogether.

And also currently sg_from_iter isn't set up to take in a ubuf_info.
We'd need that if we stash the binding in the ubuf_info.
https://github.com/isilence/linux.git sg-iter-ops

I have old patches for all of that, they even rebased cleanly. That
should do it for you, and I need to send then regardless of devmem.
These patches help a bit, but do not make any meaningful dent in
addressing the concern I have in the earlier emails.

The concern is that we're piggybacking devmem TX on MSG_ZEROCOPY, and
currently the MSG_ZEROCOPY code carefully avoids any code paths
setting msg->[sg_from_iter|msg_ubuf].
Fwiw, with that branch you don't need ->msg_ubuf at all, just pass
it as an argument from tcp_sendmsg_locked() as usual, and
->sg_from_iter is gone from there as well.
quoted
If we want devmem to reuse both the MSG_ZEROCOPY mechanisms and the
msg->[sg_from_iter|ubuf_info] mechanism, I have to dissect the
MSG_ZEROCOPY code carefully so that it works with and without
setting msg->[ubuf_info|msg->sg_from_iter]. Having gone through this
rabbit hole so far I see that it complicates the implementation and
adds more checks to the fast MSG_ZEROCOPY paths.
If you've already done, maybe you can post it as a draft? At least
it'll be obvious why you say it's more complicated.
I don't have anything worth sharing. Just went down this rabbit hole
and saw a bunch of MSG_ZEROCOPY checks (!msg->msg_ubuf checks around
MSG_ZEROCOPY code) and restrictions (skb->cb size) need to be
addressed and checks to be added. From this thread you seem to be
suggesting more changes to force in a msg->sg_from_iter approach
adding to the complications.
quoted
The complication could be worth it if there was some upside, but I
don't see one tbh. Passing the binding down to
zerocopy_fill_skb_from_devmem seems like a better approach to my eye
so far
The upside is that 1) you currently you add overhead to common
path (incl copy),
You mean the unlikely() check for devmem before delegating to
skb_zerocopy_fill_from_devmem? Should be minimal.
2) passing it down through all the function also
have overhead to the zerocopy and MSG_ZEROCOPY path, which I'd
assume is comparable to those extra checks you have.
Complicating/refactoring existing code for devmem TCP to force in a
msg->sg_from_iter and save 1 arg passed down a couple of functions
doesn't seem like a good tradeoff IMO.
3) tcp would
need to know about devmem tcp and its bindings, while it all could
be in one spot under the MSG_ZEROCOPY check.
I don't see why this is binding to tcp somehow. If anything it makes
the devmem TX implementation follow closely MSG_ZEROCOPY, and existing
MSG_ZEROCOPY code would be easily extended for devmem TX without
having to also carry refactors to migrate to msg->sg_from_iter
approach (just grab the binding and pass it to
skb_zerocopy_iter_stream).
4) When you'd want
another protocol to support that, instead of a simple

ubuf = get_devmem_ubuf();

You'd need to plumb binding passing through the stack there as
well.
Similar to above, I think this approach will actually extend easier to
any protocol already using MSG_ZEROCOPY, because we follow that
closely instead of requiring refactors to force msg->sg_from_iter
approach.

5) And keeping it in one place makes it easier to keep around.

I just don't see why it'd be complicated, but maybe I miss
something, which is why a draft prototype would explain it
better than any words.
quoted
I'm afraid I'm going to table this for now. If there is overwhelming
consensus that msg->sg_from_iter is the right approach here I will
revisit, but it seems to me to complicate code without a significant
upside.
--
Pavel Begunkov

--
Thanks,
Mina
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help