Thread (21 messages) 21 messages, 8 authors, 2025-03-13

Re: Kernel oops with 6.14 when enabling TLS

From: Vlastimil Babka <hidden>
Date: 2025-03-06 09:15:14
Also in: linux-block, linux-mm, linux-nvme

On 3/5/25 12:43, Hannes Reinecke wrote:
On 3/5/25 09:58, Vlastimil Babka wrote:
quoted
On 3/5/25 09:20, Hannes Reinecke wrote:
quoted
On 3/4/25 20:44, Vlastimil Babka wrote:
quoted
On 3/4/25 20:39, Hannes Reinecke wrote:
[ .. ]
quoted
quoted
Good news and bad news ...
Good news: TLS works again!
Bad news: no errors.
Wait, did you add a WARN_ON_ONCE() to the put_page() as I suggested? If yes
and there was no error, it would have to be leaking the page. Or the path
uses folio_put() and we'd need to put the warning there.
That triggers:
...
quoted
Not surprisingly, though, as the original code did a get_page(), so
there had to be a corresponding put_page() somewhere.
Is is this one? If there's no more warning afterwards, that should be it.
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 61f3f3d4e528..b37d99cec069 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -182,9 +182,14 @@ static int sk_msg_free_elem(struct sock *sk, struct sk_msg *msg, u32 i,
  
         /* When the skb owns the memory we free it from consume_skb path. */
         if (!msg->skb) {
+               struct folio *folio;
+
                 if (charge)
                         sk_mem_uncharge(sk, len);
-               put_page(sg_page(sge));
+
+               folio = page_folio(sg_page(sge));
+               if (!folio_test_slab(folio))
+                       folio_put(folio);
         }
         memset(sge, 0, sizeof(*sge));
         return len;
Oh, sure. But what annoys me: why do we have to care?

When doing I/O _all_ data is stuffed into bvecs via
bio_add_page(), and after that information about the
origin is lost; any iteration on the bio will be a bvec
iteration.
Previously we could just do a bvec iteration, get a reference
for each page, and start processing.
AFAIU there's BIO_PAGE_PINNED that controls whether the pages are pinned, as
there are usecases where it makes sense to do that (userspace pages?). And
__bio_release_pages() can be removing the last pin and freeing the pages.

But this is a case where the buffer is a kmalloc() allocation, so somebody
has to do the corresponding kfree() when the messages are processed. A pin
on the slab folio where the kmalloc() resides helps nothing and as willy
says it's just unnecessary overhead of atomic allocations.
Now suddenly the caller has to check if it's a slab page and don't
get a reference for that. Not only that, he also has to remember
to _not_ drop the reference when he's done.
The caller did kmalloc() and will have to do kfree(). I guess it's about
telling the intermediate layers via something similar like BIO_PAGE_PINNED
whether the pages should be pinned or not.
And, of course, tracing get_page() and the corresponding put_page()
calls through all the layers.
Really?

Cheers,

Hannes
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help