Thread (20 messages) 20 messages, 6 authors, 2021-09-17

Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc

From: Martin KaFai Lau <hidden>
Date: 2021-08-24 23:47:45
Also in: netdev

On Fri, Aug 20, 2021 at 06:02:40PM -0700, Cong Wang wrote:
From: Cong Wang <redacted>

This *incomplete* patch introduces a programmable Qdisc with
eBPF.  The goal is to make Qdisc as programmable as possible,
that is, to replace as many existing Qdisc's as we can. ;)

The design was discussed during last LPC:
https://linuxplumbersconf.org/event/7/contributions/679/attachments/520/1188/sch_bpf.pdf 

Here is a summary of design decisions I made:

1. Avoid eBPF struct_ops, as it would be really hard to program
   a Qdisc with this approach.
Please explain more on this.  What is currently missing
to make qdisc in struct_ops possible?
2. Avoid exposing skb's to user-space, which means we can't introduce
   a map to store skb's. Instead, store them in kernel without exposure
   to user-space.

So I choose to use priority queues to store skb's inside a
flow and to store flows inside a Qdisc, and let eBPF programs
decide the *relative* position of the skb within the flow and the
*relative* order of the flows too, upon each enqueue and dequeue.
Each flow is also exposed to user as a TC class, like many other
classful Qdisc's.

Although the biggest limitation is obviously that users can
not traverse the packets or flows inside the Qdisc, I think
at least they could store those global information of interest
inside their own map and map can be shared between enqueue and
dequeue. For example, users could use skb pointer as key and
rank as a value to find out the absolute order.

One of the challeges is how to interact with existing TC infra,
for instance, if users install TC filters on this Qdisc, should
we respect this by ignoring or rejecting eBPF enqueue program
attached or vice versa? Should we allow users to replace each
priority queue of a class with a regular Qdisc?

Any high-level feedbacks are welcome. Please do not review any
coding details until RFC tag is removed.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <redacted>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help