Re: [PATCH net-next v3 01/18] net: Copy slab data for sendmsg(MSG_SPLICE_PAGES)
From: Jakub Kicinski <kuba@kernel.org>
Date: 2023-06-22 20:28:42
Also in:
linux-mm, lkml
On Thu, 22 Jun 2023 20:40:43 +0100 David Howells wrote:
quoted
How did that happen? I thought MSG_SPLICE_PAGES comes from former sendpage users and sendpage can't operate on slab pages.Some of my patches, take the siw one for example, now aggregate all the bits that make up a message into a single sendmsg() call, including any protocol header and trailer in the same bio_vec[] as the payload where before it would have to do, say, sendmsg+sendpage+sendpage+...+sendpage+sendmsg.
Maybe it's just me but I'd prefer to keep the clear rule that splice operates on pages not slab objects. SIW is the software / fake implementation of RDMA, right? You couldn't have picked a less important user :( Paolo indicated that he'll take a look tomorrow, we'll see what he thinks.
I'm trying to make it so that I make the minimum number of sendmsg calls (ie. 1 where possible) and the loop that processes the data is inside of that.
The in-kernel users can be fixed to not use slab, and user space can't feed us slab objects.
This offers the opportunity, at least in the future, to append slab data to an already-existing private fragment in the skbuff.
Maybe we can get Eric to comment. The ability to identify "frag type" seems cool indeed, but I haven't thought about using it to attach slab objects.
quoted
The locking is to local_bh_disable(). Does the milliont^w new frag allocator have any additional benefits?It is shareable because it does locking. Multiple sockets of multiple protocols can share the pages it has reserved. It drops the lock around calls to the page allocator so that GFP_KERNEL/GFP_NOFS can be used with it. Without this, the page fragment allocator would need to be per-socket, I think, or be done further up the stack where the higher level drivers would have to have a fragment bucket per whatever unit they use to deal with the lack of locking.
There's also the per task frag which can be used under normal conditions (sk_use_task_frag).
Doing it here makes cleanup simpler since I just transfer my ref on the fragment to the skbuff frag list and it will automatically be cleaned up with the skbuff. Willy suggested that I just allocate a page for each thing I want to copy, but I would rather not do that for, say, an 8-byte bit of protocol data.
TBH my intuition would also be get a full page and let the callers who care about performance fix themselves. Assuming we want to let slab objects in in the first place.