sk_psock_skb_ingress_enqueue() maps a received message into a scatterlist
with skb_to_sgvec(skb, sg, off, len). On the SK_SKB strparser path off and
len come from the message's strp_msg (stm->offset and stm->full_len), set
by the stream parser. strparser does not trim the skb, so normally
skb->len - off >= full_len and len is within the skb.
An SK_SKB verdict (or parser) program may call bpf_skb_change_tail() and
shrink the skb after full_len was recorded. len then covers more bytes than
the skb holds, __skb_to_sgvec() walks past the data and trips BUG_ON(len):
kernel BUG at net/core/skbuff.c:5286!
RIP: 0010:__skb_to_sgvec+0x78c/0x790
Call Trace:
<IRQ>
skb_to_sgvec+0x32/0x90
sk_psock_skb_ingress_enqueue+0x42/0x370
sk_psock_skb_ingress_self+0x1a8/0x200
sk_psock_verdict_apply+0x33c/0x360
sk_psock_strp_read+0x24a/0x370
__strp_recv+0x66d/0xda0
__tcp_read_sock+0x13d/0x590
tcp_bpf_strp_read_sock+0x195/0x320
strp_data_ready+0x267/0x340
sk_psock_strp_data_ready+0x1ce/0x350
tcp_data_queue+0x1364/0x2fd0
</IRQ>
Clamp len to skb->len - off, and drop the message if off is already past
the skb. sk_psock_skb_ingress_enqueue() is the only skb_to_sgvec() caller
and both ingress paths (verdict SK_PASS and the backlog worker) reach it.
The clamp is a no-op unless the skb was shrunk.
Fixes: 7303524e04af ("skmsg: Lose offset info in sk_psock_skb_ingress")
Signed-off-by: Sechang Lim <redacted>
---
net/core/skmsg.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index e1850caf1a71..2961178ebd1e 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -550,6 +550,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
{
int num_sge, copied;
+ if (off >= skb->len)
+ return -EINVAL;
+ len = min_t(u32, len, skb->len - off);
+
/* skb_to_sgvec will fail when the total number of fragments in
* frag_list and frags exceeds MAX_MSG_FRAGS. For example, the
* caller may aggregate multiple skbs.--
2.43.0