Thread (107 messages) 107 messages, 27 authors, 4d ago

Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2

From: Stefan Metzmacher <metze@samba.org>
Date: 2026-06-05 15:15:56
Also in: linux-api, linux-fsdevel, linux-mm, linux-patches, lkml

Hi Linus,
On Wed, 3 Jun 2026 at 15:23, Andy Lutomirski [off-list ref] wrote:
quoted
So I'm suspicious that you've possibly make bugs much (MUCH) harder to
exploit, but the underlying awful code and opportunity for bugs is
still there.  MSG_SPLICE_PAGES is still around, and there is still
(AFAICS) no actual coherent description of what it means.
I don't disagree. I've only looked at the filesystem side.

The networking side does some odd stuff too (and I did look at some of
that, and had to be edumacated by Jakub on some of the subtler rules
for what skb data sharing is ok and when it's not - really not my
area).

But at least MSG_SPLICE_PAGES should be kernel-internal only
interface, and once you don't share page cache pages with networking
code I think that kneecaps a lot of the attacks.

So that's really the aim here for me - at least _attempting_ to go
"maybe we can just limit splice enough that it doesn't even *matter*
when networking does something odd and questionable".
While prototyping a smbdirect_splice_to_bvecs() in order to
do use rdma_rw_ctx_init_bvec() I found things like pipe_buf_try_steal()
and dived a bit deeper into struct address_space and found things like:
mapping_mapped, mapping_tagged, mapping_deny_writable, mapping_allow_writable
and similar things.

With that I'm wondering if we could allow splicing of
pages only if nobody mmap'ed the file => mapping_mapped() returned 0
and the page is not tagged with any of PAGECACHE_TAG_{DIRTY,WRITEBACK,TOWRITE}
and once a page is spliced we tag the page in the i_pages xarray
with a PAGECACHE_TAG_SPLICED. In all other cases the page would be copied.

Then any call to do_mmap() or vfs_writev at the highlevel
and at the lower levels most likely filemap_get_entry()/filemap_map_pages()
will remove the pages marked with PAGECACHE_TAG_SPLICED
and allocate new pages used for the pagecache of the related index.
It would be a bit similar to invalidate_inode_pages2_range() for direct io
writes. Maybe optimizing by clearing PAGECACHE_TAG_SPLICED if the refcount
of the page is 1.

This would also mean the content of spliced pages won't be changed
by future writes to the file, which removes the problem with unstable
pages and checksums.

It means the most common workload, e.g. a file only opened for
file serving (or simple opens in general) would still be able to
be optimized.

Does that sound useful and doable?

metze
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help