Re: [PATCH v3] drivers: net: prevent tun_get_user() to exceed xdp size limits
From: David Ahern <hidden>
Date: 2023-07-26 19:37:48
Also in:
linux-kernel-mentees, lkml
On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote:
Cc. John and Ahern On 26/07/2023 04.09, Jason Wang wrote:quoted
On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner [off-list ref] wrote:quoted
Syzkaller reported the following issue: ======================================= Too BIG xdp->frame_sz = 131072Is this a contiguous physical memory allocation? 131072 bytes equal order 5 page. Looking at tun.c code I cannot find a code path that could create order-5 skb->data, but only SKB with order-0 fragments. But I guess it is the netif_receive_generic_xdp() what will realloc to make this linear (via skb_linearize())
get_tun_user is passed an iov_iter with a single segment of 65007 total_len. The alloc_skb path is hit with an align size of only 64. That is insufficient for XDP so the netif_receive_generic_xdp hits the pskb_expand_head path. Something is off in the math in netif_receive_generic_xdp resulting in the skb markers being off. That causes bpf_prog_run_generic_xdp to compute the wrong frame_sz.
quoted hunk ↗ jump to hunk
quoted
quoted
WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline] WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103 ... Call Trace: <TASK> bpf_prog_4add87e5301a4105+0x1a/0x1c __bpf_prog_run include/linux/filter.h:600 [inline] bpf_prog_run_xdp include/linux/filter.h:775 [inline] bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721 netif_receive_generic_xdp net/core/dev.c:4807 [inline] do_xdp_generic+0x35c/0x770 net/core/dev.c:4866 tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919 tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043 call_write_iter include/linux/fs.h:1871 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x650/0xe40 fs/read_write.c:584 ksys_write+0x12f/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87 ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But tun_get_user() still provides an execution path with do_xdp_generic() and exceed XDP limits for packet size.I added this check and maybe it is too strict. XDP can work on higher order pages, as long as this is contiguous physical memory (e.g. a page). And An order 5 page (131072 bytes) seems excessive, but maybe TUN have a use-case for having such large packets? (Question to Ahern?) I'm considering we should change the size-limit to order-2 (16384) or order-3 (32768). Order-3 because netstack have: #define SKB_FRAG_PAGE_ORDER get_order(32768) And order-2 because netstack have: SKB_MAX_ALLOC (16KiB) - See discussion in commit 6306c1189e77 ("bpf: Remove MTU check in __bpf_skb_max_len"). - https://git.kernel.org/torvalds/c/6306c1189e77quoted
quoted
Using the syzkaller repro with reduced packet size it was also discovered that XDP_PACKET_HEADROOM is not checked in tun_can_build_skb(), although pad may be incremented in tun_build_skb(). If we move the limit check from tun_can_build_skb() to tun_build_skb() we will make xdp to be used only in tun_build_skb(), without falling in tun_alloc_skb(), etc. And moreover we will drop the packet which can't be processed in tun_build_skb().Looking at tun_build_skb() is uses the page_frag system, and can thus create up-to SKB_FRAG_PAGE_ORDER (size 32768 / order-3).quoted
quoted
Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/ (local) Link: https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set") Signed-off-by: Andrew Kanner <redacted> --- Notes: V2 -> V3: * attach the forgotten changelog V1 -> V2: * merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with syzkaller repro and missing XDP_PACKET_HEADROOM in pad * changed the title and description of the execution path, suggested by Jason Wang [off-list ref] * move the limit check from tun_can_build_skb() to tun_build_skb() to remove duplication and locking issue, and also drop the packet in case of a failed check - noted by Jason Wang [off-list ref]Acked-by: Jason Wang <jasowang@redhat.com> Thanksquoted
drivers/net/tun.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)diff --git a/drivers/net/tun.c b/drivers/net/tun.c index d75456adc62a..7c2b05ce0421 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c@@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(structtun_struct *tun, struct tun_file *tfile, if (zerocopy) return false; - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) - return false; - return true; }@@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(structtun_struct *tun, buflen += SKB_DATA_ALIGN(len + pad); rcu_read_unlock(); + if (buflen > PAGE_SIZE) + return ERR_PTR(-EFAULT);Concretely I'm saying maybe use SKB_FRAG_PAGE_ORDER "size" here? e.g. create SKB_FRAG_PAGE_SIZE define as below. if (buflen > SKB_FRAG_PAGE_SIZE)diff --git a/include/net/sock.h b/include/net/sock.h index 656ea89f60ff..4c4b3c257b52 100644 --- a/include/net/sock.h +++ b/include/net/sock.h@@ -2886,7 +2886,8 @@ extern int sysctl_optmem_max;extern __u32 sysctl_wmem_default; extern __u32 sysctl_rmem_default; -#define SKB_FRAG_PAGE_ORDER get_order(32768) +#define SKB_FRAG_PAGE_SIZE 32768 +#define SKB_FRAG_PAGE_ORDER get_order(SKB_FRAG_PAGE_SIZE) DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);quoted
quoted
+ alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES); if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL))) return ERR_PTR(-ENOMEM);--Jesper