Thread (39 messages) 39 messages, 6 authors, 2023-06-23

Re: [PATCH net-next v3 01/18] net: Copy slab data for sendmsg(MSG_SPLICE_PAGES)

From: Jakub Kicinski <kuba@kernel.org>
Date: 2023-06-22 20:28:42
Also in: linux-mm, lkml

On Thu, 22 Jun 2023 20:40:43 +0100 David Howells wrote:
quoted
How did that happen? I thought MSG_SPLICE_PAGES comes from former
sendpage users and sendpage can't operate on slab pages.  
Some of my patches, take the siw one for example, now aggregate all the bits
that make up a message into a single sendmsg() call, including any protocol
header and trailer in the same bio_vec[] as the payload where before it would
have to do, say, sendmsg+sendpage+sendpage+...+sendpage+sendmsg.
Maybe it's just me but I'd prefer to keep the clear rule that splice
operates on pages not slab objects. SIW is the software / fake
implementation of RDMA, right? You couldn't have picked a less
important user :(

Paolo indicated that he'll take a look tomorrow, we'll see what he
thinks.
I'm trying to make it so that I make the minimum number of sendmsg calls
(ie. 1 where possible) and the loop that processes the data is inside of that.
The in-kernel users can be fixed to not use slab, and user space can't
feed us slab objects.
This offers the opportunity, at least in the future, to append slab data to an
already-existing private fragment in the skbuff.
Maybe we can get Eric to comment. The ability to identify "frag type"
seems cool indeed, but I haven't thought about using it to attach
slab objects.
quoted
The locking is to local_bh_disable(). Does the milliont^w new frag
allocator have any additional benefits?  
It is shareable because it does locking.  Multiple sockets of multiple
protocols can share the pages it has reserved.  It drops the lock around calls
to the page allocator so that GFP_KERNEL/GFP_NOFS can be used with it.

Without this, the page fragment allocator would need to be per-socket, I
think, or be done further up the stack where the higher level drivers would
have to have a fragment bucket per whatever unit they use to deal with the
lack of locking.
There's also the per task frag which can be used under normal conditions
(sk_use_task_frag).
Doing it here makes cleanup simpler since I just transfer my ref on the
fragment to the skbuff frag list and it will automatically be cleaned up with
the skbuff.

Willy suggested that I just allocate a page for each thing I want to copy, but
I would rather not do that for, say, an 8-byte bit of protocol data.
TBH my intuition would also be get a full page and let the callers who
care about performance fix themselves. Assuming we want to let slab
objects in in the first place.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help