Re: [PATCH 5/9] esp: fix page frag reference leak on skb_to_sgvec failure
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: 2026-05-29 08:32:46
On Fri, May 29, 2026 at 09:14:28AM +0200, Paolo Abeni wrote:
On 5/29/26 7:52 AM, Steffen Klassert wrote:quoted
Ccing the author of this patch. On Thu, May 28, 2026 at 03:44:14PM +0200, Paolo Abeni wrote:quoted
On 5/27/26 10:41 AM, Steffen Klassert wrote:quoted
From: e521588 <redacted> In esp_output_tail(), when esp->inplace is false, the old skb page frags are replaced with a new page from the xfrm page_frag cache. The source scatterlist (sg) is built from the old frags before the replacement, and esp_ssg_unref() is responsible for releasing the old page references after the crypto operation completes. However, if the second skb_to_sgvec() call (which builds the destination scatterlist from the new page) fails, the code jumps to error_free which only calls kfree(tmp). The old page frag references captured in the source scatterlist are never released: 1. sg[] is built from old frags via skb_to_sgvec() (no extra get_page) 2. nr_frags is set to 1 and frag[0] is replaced with the new page 3. Second skb_to_sgvec() fails -> goto error_free 4. kfree(tmp) frees the sg[] memory but old frags are not unref'd 5. kfree_skb() only releases frag[0] (the new page), not the old ones Fix this by adding a bool parameter to esp_ssg_unref() that, when true, unconditionally unrefs the source scatterlist frags without checking req->src and req->dst, since those fields are not yet initialized by aead_request_set_crypt() at the point of the error. Existing callers pass false to preserve the original behavior. The same issue exists in both esp4 and esp6 as the code is identical. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Alessandro Schino <redacted> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- net/ipv4/esp4.c | 12 +++++++----- net/ipv6/esp6.c | 12 +++++++----- 2 files changed, 14 insertions(+), 10 deletions(-)diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 6a5febbdbee4..8314d7bddcb7 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c@@ -96,7 +96,7 @@ static inline struct scatterlist *esp_req_sg(struct crypto_aead *aead, __alignof__(struct scatterlist)); } -static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb) +static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb, bool already_unref) { struct crypto_aead *aead = x->data; int extralen = 0;@@ -113,7 +113,7 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb) /* Unref skb_frag_pages in the src scatterlist if necessary. * Skip the first sg which comes from skb->data. */ - if (req->src != req->dst) + if (already_unref || req->src != req->dst) for (sg = sg_next(req->src); sg; sg = sg_next(sg)) skb_page_unref(page_to_netmem(sg_page(sg)), skb->pp_recycle);Sashiko points the above is still unsafe:For the original patch submission, sashiko reported 'failed to apply' because it was targeted to the ipsec tree. Now with the full pull request it applied and it found issues, but now it is too late for me to fix this. Would it be possible to integrate subtrees of net and net-next into the netdev CI? This would avoid such problems in future.quoted
--- When the new caller passes already_unref = true, can the loop still be reached safely? Looking at the allocation and request setup in esp_output_tail(): tmp = esp_alloc_tmp(aead, esp->nfrags + 2, extralen); ... req = esp_tmp_req(aead, iv); sg = esp_req_sg(aead, req); esp_alloc_tmp() calls kmalloc(len, GFP_ATOMIC) — the buffer is not zeroed. esp_tmp_req() only does aead_request_set_tfm(req, aead), which assigns req->base.tfm. It does not touch req->src or req->dst. The local variable sg is filled in by skb_to_sgvec(skb, sg, ...) but nothing copies sg into req->src; that assignment is performed later by aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv). In the new error path the call sequence is: err = skb_to_sgvec(skb, dsg, ...); if (unlikely(err < 0)) { esp_ssg_unref(x, tmp, skb, true); /* before set_crypt */ goto error_free; } ... aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv); Inside esp_ssg_unref() with already_unref = true, the predicate is short-circuited but the loop body still dereferences req->src: if (already_unref || req->src != req->dst) for (sg = sg_next(req->src); sg; sg = sg_next(sg)) skb_page_unref(page_to_netmem(sg_page(sg)), skb->pp_recycle); Since req->src has not been initialized at this point (the kmalloc buffer holds whatever was there before, and aead_request_set_crypt() has not yet run), sg_next(req->src) walks an arbitrary pointer, sg_page(sg) reads sg->page_link from arbitrary memory, and skb_page_unref(page_to_netmem(...), ...) then drops a refcount on a fabricated page. The commit message also seems to acknowledge this: Fix this by adding a bool parameter to esp_ssg_unref() that, when true, unconditionally unrefs the source scatterlist frags without checking req->src and req->dst, since those fields are not yet initialized by aead_request_set_crypt() at the point of the error. If req->src/req->dst are not yet initialized, how can the loop sg = sg_next(req->src) be valid? Should the helper instead walk the local sg returned by esp_req_sg(aead, req) (which was actually populated by the first skb_to_sgvec()), or should aead_request_set_crypt() be moved before the failure check so that req->src/req->dst are well defined? The same construct appears in net/ipv6/esp6.c: --- And the SoB is incorrect for this patch. Do you rebase your tree? (I'm not suggesting starting that otherwise!!!) If so you could possibly drop this patch ?!?No, I never rebased the ipsec tree so far. I could revert it, or you can do so if you don't accept the patch as is. Rebasing would be the last resort as this creates problems for everyone who cloned my tree.Sure, I just could not remember if xfrm tree was usually rebased or not. A revert or an incremental patch are just fine.
I'll revert it for now and resend the pull reuest.