Re: [PATCH v5 bpf-next 2/9] veth: Add driver XDP
From: John Fastabend <john.fastabend@gmail.com>
Date: 2018-07-27 04:22:05
On 07/26/2018 07:40 AM, Toshiaki Makita wrote:
From: Toshiaki Makita <redacted> This is the basic implementation of veth driver XDP. Incoming packets are sent from the peer veth device in the form of skb, so this is generally doing the same thing as generic XDP. This itself is not so useful, but a starting point to implement other useful veth XDP features like TX and REDIRECT. This introduces NAPI when XDP is enabled, because XDP is now heavily relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function enqueues packets to the ring and peer NAPI handler drains the ring. Currently only one ring is allocated for each veth device, so it does not scale on multiqueue env. This can be resolved by allocating rings on the per-queue basis later. Note that NAPI is not used but netif_rx is used when XDP is not loaded, so this does not change the default behaviour. v3: - Fix race on closing the device. - Add extack messages in ndo_bpf. v2: - Squashed with the patch adding NAPI. - Implement adjust_tail. - Don't acquire consumer lock because it is guarded by NAPI. - Make poll_controller noop since it is unnecessary. - Register rxq_info on enabling XDP rather than on opening the device. Signed-off-by: Toshiaki Makita <redacted> ---
[...] One nit and one question.
+
+static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv,
+ struct sk_buff *skb)
+{
+ u32 pktlen, headroom, act, metalen;
+ void *orig_data, *orig_data_end;
+ int size, mac_len, delta, off;
+ struct bpf_prog *xdp_prog;
+ struct xdp_buff xdp;
+
+ rcu_read_lock();
+ xdp_prog = rcu_dereference(priv->xdp_prog);
+ if (unlikely(!xdp_prog)) {
+ rcu_read_unlock();
+ goto out;
+ }
+
+ mac_len = skb->data - skb_mac_header(skb);
+ pktlen = skb->len + mac_len;
+ size = SKB_DATA_ALIGN(VETH_XDP_HEADROOM + pktlen) +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ if (size > PAGE_SIZE)
+ goto drop;I'm not sure why it matters if size > PAGE_SIZE here. Why not just consume it and use the correct page order in alloc_page if its not linear.
+
+ headroom = skb_headroom(skb) - mac_len;
+ if (skb_shared(skb) || skb_head_is_locked(skb) ||
+ skb_is_nonlinear(skb) || headroom < XDP_PACKET_HEADROOM) {
+ struct sk_buff *nskb;
+ void *head, *start;
+ struct page *page;
+ int head_off;
+
+ page = alloc_page(GFP_ATOMIC);Should also have __NO_WARN here as well this can be triggered by external events so we don't want DDOS here to flood system logs.
+ if (!page)
+ goto drop;
+
+ head = page_address(page);
+ start = head + VETH_XDP_HEADROOM;
+ if (skb_copy_bits(skb, -mac_len, start, pktlen)) {
+ page_frag_free(head);
+ goto drop;
+ }
+
+ nskb = veth_build_skb(head,
+ VETH_XDP_HEADROOM + mac_len, skb->len,
+ PAGE_SIZE);
+ if (!nskb) {
+ page_frag_free(head);
+ goto drop;
+ }
+
+ skb_copy_header(nskb, skb);
+ head_off = skb_headroom(nskb) - skb_headroom(skb);
+ skb_headers_offset_update(nskb, head_off);
+ if (skb->sk)
+ skb_set_owner_w(nskb, skb->sk);
+ consume_skb(skb);
+ skb = nskb;
+ }
+
+ xdp.data_hard_start = skb->head;
+ xdp.data = skb_mac_header(skb);
+ xdp.data_end = xdp.data + pktlen;
+ xdp.data_meta = xdp.data;
+ xdp.rxq = &priv->xdp_rxq;
+ orig_data = xdp.data;
+ orig_data_end = xdp.data_end;
+
+ act = bpf_prog_run_xdp(xdp_prog, &xdp);
+
+ switch (act) {
+ case XDP_PASS:
+ break;
+ default:
+ bpf_warn_invalid_xdp_action(act);
+ case XDP_ABORTED:
+ trace_xdp_exception(priv->dev, xdp_prog, act);
+ case XDP_DROP:
+ goto drop;
+ }
+ rcu_read_unlock();
+
+ delta = orig_data - xdp.data;
+ off = mac_len + delta;
+ if (off > 0)
+ __skb_push(skb, off);
+ else if (off < 0)
+ __skb_pull(skb, -off);
+ skb->mac_header -= delta;
+ off = xdp.data_end - orig_data_end;
+ if (off != 0)
+ __skb_put(skb, off);
+ skb->protocol = eth_type_trans(skb, priv->dev);
+
+ metalen = xdp.data - xdp.data_meta;
+ if (metalen)
+ skb_metadata_set(skb, metalen);
+out:
+ return skb;
+drop:
+ rcu_read_unlock();
+ kfree_skb(skb);
+ return NULL;
+}
+Thanks, John