Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()
From: Wolfgang Walter <hidden>
Date: 2018-09-11 21:53:25
Am Dienstag, 11. September 2018, 12:33:34 schrieb Steffen Klassert:
quoted hunk ↗ jump to hunk
On Mon, Sep 10, 2018 at 10:18:47AM +0200, Kristian Evensen wrote:quoted
Hi, Thanks everyone for all the effort in debugging this issue. On Mon, Sep 10, 2018 at 8:39 AM Steffen Klassert [off-list ref] wrote:quoted
The easy fix that could be backported to stable would be to check skb->dst for NULL and drop the packet in that case.Thought I should just chime in and say that we deployed this work-around when we started observing the error back in June. Since then we have not seen any crashes. Also, we have instrumented some of our kernels to count the number of times the error is hit (overall + consecutive). Compared to the overall number of packets, the error happens very rarely. With our workloads, we on average see the error once every couple of days.Thanks for letting us know! I plan to fix this in the ipsec tree with: Subject: [PATCH RFC] xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry. Since commit 222d7dbd258d ("net: prevent dst uses after free") skb_dst_force() might clear the dst_entry attached to the skb. The xfrm code don't expect this to happen, so we crash with a NULL pointer dereference in this case. Fix it by checking skb_dst(skb) for NULL after skb_dst_force() and drop the packet in cast the dst_entry was cleared. Fixes: 222d7dbd258d ("net: prevent dst uses after free") Reported-by: Tobias Hommel <redacted> Reported-by: Kristian Evensen <redacted> Reported-by: Wolfgang Walter <redacted> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- net/xfrm/xfrm_output.c | 4 ++++ net/xfrm/xfrm_policy.c | 4 ++++ 2 files changed, 8 insertions(+)diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index 89b178a78dc7..36d15a38ce5e 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c@@ -101,6 +101,10 @@ static int xfrm_output_one(struct sk_buff *skb, interr) spin_unlock_bh(&x->lock); skb_dst_force(skb); + if (!skb_dst(skb)) { + XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR); + goto error_nolock; + } if (xfrm_offload(skb)) { x->type_offload->encap(x, skb);diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 7c5e8978aeaa..626e0f4d1749 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c@@ -2548,6 +2548,10 @@ int __xfrm_route_forward(struct sk_buff *skb,unsigned short family) } skb_dst_force(skb); + if (!skb_dst(skb)) { + XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR); + return 0; + } dst = xfrm_lookup(net, skb_dst(skb), &fl, NULL, XFRM_LOOKUP_QUEUE); if (IS_ERR(dst)) {
This patch fixes the problem here. XfrmFwdHdrError gets around 80 at the very beginning and remains so. Probably this happens when some route are changed/set then. Regards and thanks, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts