Thread (79 messages) 79 messages, 7 authors, 2023-04-14

Re: [RFC PATCH v2 30/48] siw: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage to transmit

From: David Howells <dhowells@redhat.com>
Date: 2023-03-29 15:34:33
Also in: linux-fsdevel, linux-mm, linux-rdma, lkml

Bernard Metzler [off-list ref] wrote:
quoted
When transmitting data, call down into TCP using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather than
performing several sendmsg and sendpage calls to transmit header, data
pages and trailer.

To make this work, the data is assembled in a bio_vec array and attached to
a BVEC-type iterator.  The header and trailer (if present) are copied into
page fragments that can be freed with put_page().
I like it a lot if it still keeps zero copy sendpage() semantics for
the cases the driver can make use of data transfers w/o copy. 
Is 'msg.msg_flags |= MSG_SPLICE_PAGES' doing that magic?
Yes.  MSG_SPLICE_PAGES indicates that you want the socket to retain your
buffer and pass it directly to the device.  Note that it's just a hint,
however, pages that are unspliceable (eg. they belong to the slab) will get
copied into a page fragment instead.  Further, if the device cannot support a
vector, then the hint can be ignored and all the data can be copied as normal.
'splicing' suggest just merging pages to me.
'splicing' as in what the splice system call does.

Unfortunately, MSG_ZEROCOPY is already a (different) thing.
It would simplify the transmit code path substantially, also getting
rid of kmap_local_page()/kunmap_local() sequences for multi-fragment
sendmsg()'s.
If the ITER_ITERLIST iterator is accepted, then siw would be able to do mix
KVEC and BVEC iterators, e.g. what I did for sunrpc here:

	https://lore.kernel.org/linux-fsdevel/20230329141354.516864-42-dhowells@redhat.com/T/#u (local)

This means that in siw_tx_hdt() where I made it copy data into page fragments
using page_frag_memdup() and attach that to a bvec:

	hdr_len = c_tx->ctrl_len - c_tx->ctrl_sent;
	h = page_frag_memdup(NULL, hdr, hdr_len, GFP_NOFS, ULONG_MAX);
	if (!h)
		goto done;
	bvec_set_virt(&bvec[0], h, hdr_len);
	seg = 1;

it can just set up a kvec instead.

Unfortunately, it's not so easy to get rid of all of the kmap'ing as we need
to do some of it to do the hashing.

David
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help