Re: [RFC PATCH bpf-next 3/3] samples/bpf: Add a simple bridge example... | netdev

Re: [RFC PATCH bpf-next 3/3] samples/bpf: Add a simple bridge example accelerated with XDP

From: Yoshiki Komachi <hidden>
Date: 2020-08-04 10:09:09
Also in: bpf, bridge

2020/07/31 23:15、Jesper Dangaard Brouer [off-list ref]のメール:


I really appreciate that you are working on adding this helper.
Some comments below.

Thanks! Find my response below, please.

On Fri, 31 Jul 2020 13:44:20 +0900
Yoshiki Komachi [off-list ref] wrote:

quoted

diff --git a/samples/bpf/xdp_bridge_kern.c b/samples/bpf/xdp_bridge_kern.c
new file mode 100644
index 000000000000..00f802503199
--- /dev/null
+++ b/samples/bpf/xdp_bridge_kern.c

@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 NTT Corp. All Rights Reserved.
+ *

[...]

quoted

+
+struct {
+	__uint(type, BPF_MAP_TYPE_DEVMAP_HASH);
+	__uint(key_size, sizeof(int));
+	__uint(value_size, sizeof(int));
+	__uint(max_entries, 64);
+} xdp_tx_ports SEC(".maps");
+
+static __always_inline int xdp_bridge_proto(struct xdp_md *ctx, u16 br_vlan_proto)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data = (void *)(long)ctx->data;
+	struct bpf_fdb_lookup fdb_lookup_params;
+	struct vlan_hdr *vlan_hdr = NULL;
+	struct ethhdr *eth = data;
+	u16 h_proto;
+	u64 nh_off;
+	int rc;
+
+	nh_off = sizeof(*eth);
+	if (data + nh_off > data_end)
+		return XDP_DROP;
+
+	__builtin_memset(&fdb_lookup_params, 0, sizeof(fdb_lookup_params));
+
+	h_proto = eth->h_proto;
+
+	if (unlikely(ntohs(h_proto) < ETH_P_802_3_MIN))
+		return XDP_PASS;
+
+	/* Handle VLAN tagged packet */
+	if (h_proto == br_vlan_proto) {
+		vlan_hdr = (void *)eth + nh_off;
+		nh_off += sizeof(*vlan_hdr);
+		if ((void *)eth + nh_off > data_end)
+			return XDP_PASS;
+
+		fdb_lookup_params.vlan_id = ntohs(vlan_hdr->h_vlan_TCI) &
+					VLAN_VID_MASK;
+	}
+
+	/* FIXME: Although Linux bridge provides us with vlan filtering (contains
+	 * PVID) at ingress, the feature is currently unsupported in this XDP program.
+	 *
+	 * Two ideas to realize the vlan filtering are below:
+	 *   1. usespace daemon monitors bridge vlan events and notifies XDP programs

                  ^^
Typo: usespace -> userspace

I will fix this in the next version.

quoted

+	 *      of them through BPF maps
+	 *   2. introduce another bpf helper to retrieve bridge vlan information

The comment appears two times time this file.

I was aiming to show future implementation of the vlan filtering at ingress (not egress) to
be required here by the above comment.

quoted

+	 *
+	 *
+	 * FIXME: After the vlan filtering, learning feature is required here, but
+	 * it is currently unsupported as well. If another bpf helper for learning
+	 * is accepted, the processing could be implemented in the future.
+	 */
+
+	memcpy(&fdb_lookup_params.addr, eth->h_dest, ETH_ALEN);
+
+	/* Note: This program definitely takes ifindex of ingress interface as
+	 * a bridge port. Linux networking devices can be stacked and physical
+	 * interfaces are not necessarily slaves of bridges (e.g., bonding or
+	 * vlan devices can be slaves of bridges), but stacked bridge ports are
+	 * currently unsupported in this program. In such cases, XDP programs
+	 * should be attached to a lower device in order to process packets with
+	 * higher speed. Then, a new bpf helper to find upper devices will be
+	 * required here in the future because they will be registered on FDB
+	 * in the kernel.
+	 */
+	fdb_lookup_params.ifindex = ctx->ingress_ifindex;
+
+	rc = bpf_fdb_lookup(ctx, &fdb_lookup_params, sizeof(fdb_lookup_params), 0);
+	if (rc != BPF_FDB_LKUP_RET_SUCCESS) {
+		/* In cases of flooding, XDP_PASS will be returned here */
+		return XDP_PASS;
+	}
+
+	/* FIXME: Although Linux bridge provides us with vlan filtering (contains
+	 * untagged policy) at egress as well, the feature is currently unsupported
+	 * in this XDP program.
+	 *
+	 * Two ideas to realize the vlan filtering are below:
+	 *   1. usespace daemon monitors bridge vlan events and notifies XDP programs
+	 *      of them through BPF maps
+	 *   2. introduce another bpf helper to retrieve bridge vlan information
+	 */

(2nd time the comment appears)

The 2nd one is marking for future implementation of the egress filtering.

Sorry for confusing you. I will try to remove the redundancy and confusion.

quoted

A comment about below bpf_redirect_map() would be good.  Explaining
that we depend on fallback behavior, to let normal bridge code handle
other cases (e.g. flood/broadcast). And also that if lookup fails,
XDP_PASS/fallback also happens.

In this example, flooded packets will be transferred to the upper normal bridge by not the
bpf_redirect_map() call but the XDP_PASS action as below:

+	rc = bpf_fdb_lookup(ctx, &fdb_lookup_params, sizeof(fdb_lookup_params), 0);
+	if (rc != BPF_FDB_LKUP_RET_SUCCESS) {
+		/* In cases of flooding, XDP_PASS will be returned here */
+		return XDP_PASS;
+	}

Thus, such a comment should be described as above, IMO.

Thanks & Best regards,

quoted

+	return bpf_redirect_map(&xdp_tx_ports, fdb_lookup_params.ifindex, XDP_PASS);
+}
+
+SEC("xdp_bridge")
+int xdp_bridge_prog(struct xdp_md *ctx)
+{
+	return xdp_bridge_proto(ctx, 0);
+}
+
+SEC("xdp_8021q_bridge")
+int xdp_8021q_bridge_prog(struct xdp_md *ctx)
+{
+	return xdp_bridge_proto(ctx, htons(ETH_P_8021Q));
+}
+
+SEC("xdp_8021ad_bridge")
+int xdp_8021ad_bridge_prog(struct xdp_md *ctx)
+{
+	return xdp_bridge_proto(ctx, htons(ETH_P_8021AD));
+}
+
+char _license[] SEC("license") = "GPL";


-- 
Best regards,
 Jesper Dangaard Brouer
 MSc.CS, Principal Kernel Engineer at Red Hat
 LinkedIn: http://www.linkedin.com/in/brouer

—
Yoshiki Komachi
komachi.yoshiki@gmail.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help