Thread (20 messages) 20 messages, 6 authors, 2021-09-17

Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc

From: Toke Høiland-Jørgensen <hidden>
Date: 2021-09-06 11:45:50
Also in: bpf

Cong Wang [off-list ref] writes:
On Wed, Sep 1, 2021 at 3:42 AM Toke Høiland-Jørgensen [off-list ref] wrote:
quoted
John Fastabend [off-list ref] writes:
quoted
Cong Wang wrote:
quoted
On Tue, Aug 24, 2021 at 4:47 PM Martin KaFai Lau [off-list ref] wrote:
quoted
Please explain more on this.  What is currently missing
to make qdisc in struct_ops possible?
I think you misunderstand this point. The reason why I avoid it is
_not_ anything is missing, quite oppositely, it is because it requires
a lot of work to implement a Qdisc with struct_ops approach, literally
all those struct Qdisc_ops (not to mention struct Qdisc_class_ops).
WIth current approach, programmers only need to implement two
eBPF programs (enqueue and dequeue).

Thanks.
Another idea. Rather than work with qdisc objects which creates all
these issues with how to work with existing interfaces, filters, etc.
Why not create an sk_buff map? Then this can be used from the existing
egress/ingress hooks independent of the actual qdisc being used.
I agree. In fact, I'm working on doing just this for XDP, and I see no
reason why the map type couldn't be reused for skbs as well. Doing it
this way has a couple of benefits:
I do see a lot of reasons, for starters, struct skb_buff is very different
from struct xdp_buff, any specialized map can not be reused. I guess you
are using a generic one, how do you handle the refcnt at least for skb?
Well, you can't keep XDP frames and skbs in the same map instance, but
you can create a map type that can be instantiated to hold either type
and otherwise keep the same semantics. The map can just inc/dec the
refcnt as skbs are added/removed from it.
quoted
- It leaves more flexibility to BPF: want a simple FIFO queue? just
  implement that with a single queue map. Or do you want to build a full
  hierarchical queueing structure? Just instantiate as many queue maps
  as you need to achieve this. Etc.
Please give an example without a queue. ;) Queue is too simple, show us
something more useful please. How do you plan to re-implement EDT with
just queues?
I'm using 'queue' as a shorthand for any queueing/scheduling algorithm
implementable by a qdisc. We need to cover them all, obviously, not just
FIFO queues (in fact I think we should actively be discouraging those,
but that's a different story :) )

For EDT it would be something like:

- On enqueue, stick frames into the map with a rank corresponding to
  their transmission time (the map implements the PIFO queue, just like
  your patch).

- (re-)arm a BPF timer to fire at the time of the next transmission
  event, and have that timer trigger interface TX.

The first bit is straight-forward, and that last bit needs a new helper
or something like it. For qdiscs I guess we could just expose
qdisc_watchdog()? For XDP we'd need something new...

quoted
- The behaviour is defined entirely by BPF program behaviour, and does
  not require setting up a qdisc hierarchy in addition to writing BPF
  code.
I have no idea why you call this a benefit, because my goal is to
replace Qdisc's, not to replace any other things. You know there are
plenty of Qdisc's which are not implemented in Linux kernel.
It's a benefit because it means you can keep everything together. I.e.,
you don't need to *both* write BPF code implementing your qdisc, *and* a
setup script to build the qdisc hierarchy. That simplifies deployment.

I suppose we could support inserting BPF qdiscs into a qdisc hierarchy
as well if needed. I don't personally see much use for that, but if
there's a use case, sure, why not?
quoted
- It should be possible to structure the hooks in a way that allows
  reusing queueing algorithm implementations between the qdisc and XDP
  layers.
XDP has no skb but xdp_buff, no? And again, why only queues?
See above :)

-Toke
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help