Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc
From: Martin KaFai Lau <hidden>
Date: 2021-08-24 23:47:45
Also in:
netdev
On Fri, Aug 20, 2021 at 06:02:40PM -0700, Cong Wang wrote:
From: Cong Wang <redacted> This *incomplete* patch introduces a programmable Qdisc with eBPF. The goal is to make Qdisc as programmable as possible, that is, to replace as many existing Qdisc's as we can. ;) The design was discussed during last LPC: https://linuxplumbersconf.org/event/7/contributions/679/attachments/520/1188/sch_bpf.pdf Here is a summary of design decisions I made: 1. Avoid eBPF struct_ops, as it would be really hard to program a Qdisc with this approach.
Please explain more on this. What is currently missing to make qdisc in struct_ops possible?
2. Avoid exposing skb's to user-space, which means we can't introduce a map to store skb's. Instead, store them in kernel without exposure to user-space. So I choose to use priority queues to store skb's inside a flow and to store flows inside a Qdisc, and let eBPF programs decide the *relative* position of the skb within the flow and the *relative* order of the flows too, upon each enqueue and dequeue. Each flow is also exposed to user as a TC class, like many other classful Qdisc's. Although the biggest limitation is obviously that users can not traverse the packets or flows inside the Qdisc, I think at least they could store those global information of interest inside their own map and map can be shared between enqueue and dequeue. For example, users could use skb pointer as key and rank as a value to find out the absolute order. One of the challeges is how to interact with existing TC infra, for instance, if users install TC filters on this Qdisc, should we respect this by ignoring or rejecting eBPF enqueue program attached or vice versa? Should we allow users to replace each priority queue of a class with a regular Qdisc? Any high-level feedbacks are welcome. Please do not review any coding details until RFC tag is removed. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <redacted>