Re: [PATCH v3] bpf, sockmap: keep sk_msg copy state in sync
From: John Fastabend <john.fastabend@gmail.com>
Date: 2026-05-20 16:59:17
Also in:
bpf, lkml, stable
On Wed, May 20, 2026 at 11:09:36AM +0000, bot+bpf-ci@kernel.org wrote:
quoted
diff --git a/net/core/filter.c b/net/core/filter.c index 9590877b0714..223d78708486 100644 --- a/net/core/filter.c +++ b/net/core/filter.c[ ... ]quoted
@@ -2794,6 +2835,8 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start, { struct scatterlist sge, nsge, nnsge, rsge = {0}, *psge; u32 new, i = 0, l = 0, space, copy = 0, offset = 0; + bool sge_copy = false, nsge_copy = false, nnsge_copy = false; + bool rsge_copy = false; u8 *raw, *to, *from; struct page *page;@@ -2866,6 +2909,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start, sk_msg_iter_var_prev(i); psge = sk_msg_elem(msg, i); rsge = sk_msg_elem_cpy(msg, i); + rsge_copy = sk_msg_elem_is_copy(msg, i); psge->length = start - offset; rsge.length -= psge->length;
Zhang, Did you want to push two smallish patches to fix these two? I have had similar patches on my test system for a few days so they LGTM. These two fixes
page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC |
__GFP_COMP | __GFP_ZERO,
get_order(copy + len));and
rsge.offset += start - offset;
bot must be reading reviews because I called that out in v2. [...]
Could the ring buffer state become corrupted when data is appended exactly at the end of the message and the ring buffer is full? If data is appended exactly at the end of the message (start == offset + l), the loop finishes with i pointing to the unused slot msg->sg.end. If the ring buffer is full (!space), the coalescing fallback logic reads: copy = msg->sg.data[i].length Because i points to an unused slot, its length is 0, making copy == 0, and the fallback coalescing block is bypassed. The function then unconditionally calls: sk_msg_iter_next(msg, end) which increments msg->sg.end. Since the ring buffer was already full, advancing msg->sg.end causes it to wrap around and equal msg->sg.start. This could corrupt the ring buffer state and overwrite the first element without calling put_page(), leading to page leaks.
^^^ This one I haven't looked into yet. Let me know if you have time to get to above two issues this week would be great to get a v1 out at least. Thanks, John