Thread (21 messages) 21 messages, 8 authors, 2025-03-13

Re: Kernel oops with 6.14 when enabling TLS

From: Matthew Wilcox <willy@infradead.org>
Date: 2025-03-04 16:53:16
Also in: linux-block, linux-mm, linux-nvme

On Tue, Mar 04, 2025 at 05:32:32PM +0100, Hannes Reinecke wrote:
On 3/4/25 17:14, Matthew Wilcox wrote:
quoted
I thought we'd done all the work needed to get rid of these pointless
refcount bumps.  Turns out that's only on the block side (eg commit
e4cc64657bec).  So what does networking need in order to understand
that some iovecs do not need to mess with the refcount?
The network stack needs to get hold of the page while transmission is
ongoing, as there is potentially rather deep queueing involved,
requiring several calls to sendmsg() and friends before the page is finally
transmitted. And maybe some post-processing (checksums,
digests, you name it), too, all of which require the page to be there.

It's all so jumbled up ... personally, I would _love_ to do away with
__iov_iter_get_pages_alloc(). Allocating a page array? Seriously?

And the problem with that is that it's always takes a page(!) reference,
completely oblivious to the fact whether you even _can_ take a page
reference (eg for tail pages); we've hit this problem several times now
(check for sendpage_ok() ...).
Calling get_page() / put_page() on a tail page is fine -- that just
redirects to the head page.  But calling it on a slab never made any
sense; at best it gets you the equivalent of TYPESAFE_BY_RCU -- that is,
the object can be freed and reallocated, but the underlying slab will
not be reallocated to some other purpose.
But that's not the real issue; real issue is that the page reference is
taken down in the very bowels of __iov_iter_get_pages_alloc(), but needs
to be undone by the _caller_. Who might (or might not) have an idea
that he needs to drop the reference here.
That's why there is no straightforward conversion; you need to audit
each and every caller and try to find out where the page reference (if any)
is dropped.
Bah.

Can't we (at the very least) leave it to the caller of
__iov_iter_get_pages() to get a page reference (he has access to the page
array, after all ...)? That would make the interface slightly
better, and it'll be far more obvious to the caller what needs
to be done.
Right, that's what happened in the block layer.  We mark the bio with
BIO_PAGE_PINNED if the pincount needs to be dropped.  As a transitional
period, we had BIO_PAGE_REFFED which indicated that the page refcount
needed to be dropped.  Perhaps there's something similar that network
could be doing.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help