Thread (15 messages) 15 messages, 3 authors, 2026-05-29

Re: [PATCH 5/9] esp: fix page frag reference leak on skb_to_sgvec failure

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: 2026-05-29 08:32:46

On Fri, May 29, 2026 at 09:14:28AM +0200, Paolo Abeni wrote:
On 5/29/26 7:52 AM, Steffen Klassert wrote:
quoted
Ccing the author of this patch.

On Thu, May 28, 2026 at 03:44:14PM +0200, Paolo Abeni wrote:
quoted
On 5/27/26 10:41 AM, Steffen Klassert wrote:
quoted
From: e521588 <redacted>

In esp_output_tail(), when esp->inplace is false, the old skb page frags
are replaced with a new page from the xfrm page_frag cache. The source
scatterlist (sg) is built from the old frags before the replacement, and
esp_ssg_unref() is responsible for releasing the old page references
after the crypto operation completes.

However, if the second skb_to_sgvec() call (which builds the destination
scatterlist from the new page) fails, the code jumps to error_free which
only calls kfree(tmp). The old page frag references captured in the
source scatterlist are never released:

  1. sg[] is built from old frags via skb_to_sgvec() (no extra get_page)
  2. nr_frags is set to 1 and frag[0] is replaced with the new page
  3. Second skb_to_sgvec() fails -> goto error_free
  4. kfree(tmp) frees the sg[] memory but old frags are not unref'd
  5. kfree_skb() only releases frag[0] (the new page), not the old ones

Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
unconditionally unrefs the source scatterlist frags without checking
req->src and req->dst, since those fields are not yet initialized by
aead_request_set_crypt() at the point of the error. Existing callers
pass false to preserve the original behavior.

The same issue exists in both esp4 and esp6 as the code is identical.

Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible")

Signed-off-by: Alessandro Schino <redacted>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/esp4.c | 12 +++++++-----
 net/ipv6/esp6.c | 12 +++++++-----
 2 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 6a5febbdbee4..8314d7bddcb7 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -96,7 +96,7 @@ static inline struct scatterlist *esp_req_sg(struct crypto_aead *aead,
 			     __alignof__(struct scatterlist));
 }
 
-static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
+static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb, bool already_unref)
 {
 	struct crypto_aead *aead = x->data;
 	int extralen = 0;
@@ -113,7 +113,7 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
 	/* Unref skb_frag_pages in the src scatterlist if necessary.
 	 * Skip the first sg which comes from skb->data.
 	 */
-	if (req->src != req->dst)
+	if (already_unref || req->src != req->dst)
 		for (sg = sg_next(req->src); sg; sg = sg_next(sg))
 			skb_page_unref(page_to_netmem(sg_page(sg)),
 				       skb->pp_recycle);
Sashiko points the above is still unsafe:
For the original patch submission, sashiko reported 'failed
to apply' because it was targeted to the ipsec tree. Now
with the full pull request it applied and it found issues,
but now it is too late for me to fix this.

Would it be possible to integrate subtrees of net and net-next
into the netdev CI? This would avoid such problems in future.
quoted
---
When the new caller passes already_unref = true, can the loop still be
reached safely?
Looking at the allocation and request setup in esp_output_tail():
	tmp = esp_alloc_tmp(aead, esp->nfrags + 2, extralen);
	...
	req = esp_tmp_req(aead, iv);
	sg  = esp_req_sg(aead, req);
esp_alloc_tmp() calls kmalloc(len, GFP_ATOMIC) — the buffer is not
zeroed.
esp_tmp_req() only does aead_request_set_tfm(req, aead), which assigns
req->base.tfm. It does not touch req->src or req->dst.
The local variable sg is filled in by skb_to_sgvec(skb, sg, ...) but
nothing copies sg into req->src; that assignment is performed later by
aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv).
In the new error path the call sequence is:
	err = skb_to_sgvec(skb, dsg, ...);
	if (unlikely(err < 0)) {
		esp_ssg_unref(x, tmp, skb, true);   /* before set_crypt */
		goto error_free;
	}
	...
	aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv);
Inside esp_ssg_unref() with already_unref = true, the predicate is
short-circuited but the loop body still dereferences req->src:
	if (already_unref || req->src != req->dst)
		for (sg = sg_next(req->src); sg; sg = sg_next(sg))
			skb_page_unref(page_to_netmem(sg_page(sg)),
				       skb->pp_recycle);
Since req->src has not been initialized at this point (the kmalloc
buffer holds whatever was there before, and aead_request_set_crypt()
has not yet run), sg_next(req->src) walks an arbitrary pointer,
sg_page(sg) reads sg->page_link from arbitrary memory, and
skb_page_unref(page_to_netmem(...), ...) then drops a refcount on a
fabricated page.
The commit message also seems to acknowledge this:
    Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
    unconditionally unrefs the source scatterlist frags without checking
    req->src and req->dst, since those fields are not yet initialized by
    aead_request_set_crypt() at the point of the error.
If req->src/req->dst are not yet initialized, how can the loop
sg = sg_next(req->src) be valid?  Should the helper instead walk the
local sg returned by esp_req_sg(aead, req) (which was actually
populated by the first skb_to_sgvec()), or should
aead_request_set_crypt() be moved before the failure check so that
req->src/req->dst are well defined?
The same construct appears in net/ipv6/esp6.c:
---

And the SoB is incorrect for this patch. Do you rebase your tree? (I'm
not suggesting starting that otherwise!!!) If so you could possibly drop
this patch ?!?
No, I never rebased the ipsec tree so far. I could revert it, or you can
do so if you don't accept the patch as is. Rebasing would be the last
resort as this creates problems for everyone who cloned my tree.
Sure, I just could not remember if xfrm tree was usually rebased or not.
A revert or an incremental patch are just fine.
I'll revert it for now and resend the pull reuest.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help