Re: Kernel oops with 6.14 when enabling TLS
From: Vlastimil Babka <hidden>
Date: 2025-03-06 09:15:14
Also in:
linux-block, linux-mm, linux-nvme
On 3/5/25 12:43, Hannes Reinecke wrote:
On 3/5/25 09:58, Vlastimil Babka wrote:quoted
On 3/5/25 09:20, Hannes Reinecke wrote:quoted
On 3/4/25 20:44, Vlastimil Babka wrote:quoted
On 3/4/25 20:39, Hannes Reinecke wrote:[ .. ]quoted
quoted
Good news and bad news ... Good news: TLS works again! Bad news: no errors.Wait, did you add a WARN_ON_ONCE() to the put_page() as I suggested? If yes and there was no error, it would have to be leaking the page. Or the path uses folio_put() and we'd need to put the warning there.That triggers:...quoted
Not surprisingly, though, as the original code did a get_page(), so there had to be a corresponding put_page() somewhere.Is is this one? If there's no more warning afterwards, that should be it.diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 61f3f3d4e528..b37d99cec069 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c@@ -182,9 +182,14 @@ static int sk_msg_free_elem(struct sock *sk, struct sk_msg *msg, u32 i, /* When the skb owns the memory we free it from consume_skb path. */ if (!msg->skb) { + struct folio *folio; + if (charge) sk_mem_uncharge(sk, len); - put_page(sg_page(sge)); + + folio = page_folio(sg_page(sge)); + if (!folio_test_slab(folio)) + folio_put(folio); } memset(sge, 0, sizeof(*sge)); return len;Oh, sure. But what annoys me: why do we have to care? When doing I/O _all_ data is stuffed into bvecs via bio_add_page(), and after that information about the origin is lost; any iteration on the bio will be a bvec iteration. Previously we could just do a bvec iteration, get a reference for each page, and start processing.
AFAIU there's BIO_PAGE_PINNED that controls whether the pages are pinned, as there are usecases where it makes sense to do that (userspace pages?). And __bio_release_pages() can be removing the last pin and freeing the pages. But this is a case where the buffer is a kmalloc() allocation, so somebody has to do the corresponding kfree() when the messages are processed. A pin on the slab folio where the kmalloc() resides helps nothing and as willy says it's just unnecessary overhead of atomic allocations.
Now suddenly the caller has to check if it's a slab page and don't get a reference for that. Not only that, he also has to remember to _not_ drop the reference when he's done.
The caller did kmalloc() and will have to do kfree(). I guess it's about telling the intermediate layers via something similar like BIO_PAGE_PINNED whether the pages should be pinned or not.
And, of course, tracing get_page() and the corresponding put_page() calls through all the layers. Really? Cheers, Hannes