Thread (20 messages) 20 messages, 6 authors, 2021-09-17

Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc

From: Martin KaFai Lau <hidden>
Date: 2021-09-01 17:45:52
Also in: bpf

On Wed, Sep 01, 2021 at 12:42:03PM +0200, Toke Høiland-Jørgensen wrote:
John Fastabend [off-list ref] writes:
quoted
Cong Wang wrote:
quoted
On Tue, Aug 24, 2021 at 4:47 PM Martin KaFai Lau [off-list ref] wrote:
quoted
Please explain more on this.  What is currently missing
to make qdisc in struct_ops possible?
I think you misunderstand this point. The reason why I avoid it is
_not_ anything is missing, quite oppositely, it is because it requires
a lot of work to implement a Qdisc with struct_ops approach, literally
all those struct Qdisc_ops (not to mention struct Qdisc_class_ops).
WIth current approach, programmers only need to implement two
eBPF programs (enqueue and dequeue).
_if_ it is using as a qdisc object/interface,
the patch "looks" easier because it obscures some of the ops/interface
from the bpf user.  The user will eventually ask for more flexibility
and then an on-par interface as the kernel's qdisc.  If there are some
common 'ops', the common bpf code can be shared as a library in userspace
or there is also kfunc call to call into the kernel implementation.
For existing kernel qdisc author,  it will be easier to use the same
interface also.
quoted
Another idea. Rather than work with qdisc objects which creates all
these issues with how to work with existing interfaces, filters, etc.
Why not create an sk_buff map? Then this can be used from the existing
egress/ingress hooks independent of the actual qdisc being used.
I agree. In fact, I'm working on doing just this for XDP, and I see no
reason why the map type couldn't be reused for skbs as well. Doing it
this way has a couple of benefits:

- It leaves more flexibility to BPF: want a simple FIFO queue? just
  implement that with a single queue map. Or do you want to build a full
  hierarchical queueing structure? Just instantiate as many queue maps
  as you need to achieve this. Etc.
Agree.  Regardless how the interface may look like,
I even think being able to queue/dequeue an skb into different bpf maps
should be the first thing to do here.  Looking forward to your patches.
- The behaviour is defined entirely by BPF program behaviour, and does
  not require setting up a qdisc hierarchy in addition to writing BPF
  code.
Interesting idea.  If it does not need to use the qdisc object/interface
and be able to do the qdisc hierarchy setup in a programmable way, it may
be nice.  It will be useful for the future patches to come with some
bpf prog examples to do that.
- It should be possible to structure the hooks in a way that allows
  reusing queueing algorithm implementations between the qdisc and XDP
  layers.
quoted
You mention skb should not be exposed to userspace? Why? Whats the
reason for this? Anyways we can make kernel only maps if we want or
scrub the data before passing it to userspace. We do this already in
some cases.
Yup, that's my approach as well.
quoted
IMO it seems cleaner and more general to allow sk_buffs
to be stored in maps and pulled back out later for enqueue/dequeue.
FWIW there's some gnarly details here (for instance, we need to make
sure the BPF program doesn't leak packet references after they are
dequeued from the map). My idea is to use a scheme similar to what we do
for XDP_REDIRECT, where a helper sets some hidden variables and doesn't
actually remove the packet from the queue until the BPF program exits
(so the kernel can make sure things are accounted correctly).
The verifier is tracking the sk's references.  Can it be reused to
track the skb's reference?
quoted
I think one trick might be how to trigger the dequeue event on
transition from stopped to running net_device or other events like
this, but that could be solved with another program attached to
those events to kick the dequeue logic.
This is actually easy in the qdisc case, I think: there's already a
qdisc_dequeue() operation, which just needs to execute a BPF program
that picks which packet to dequeue (by pulling it off a queue map). For
XDP we do need a new hook, on driver TX completion or something like
that. Details TBD. Also, we need a way to BPF to kick an idle interface
and make it start transmitting; that way we can implement a traffic
shaper (that delays packets) by using BPF timers :)

-Toke
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help