Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc
From: Toke Høiland-Jørgensen <hidden>
Date: 2021-09-06 11:45:50
Also in:
bpf
Cong Wang [off-list ref] writes:
On Wed, Sep 1, 2021 at 3:42 AM Toke Høiland-Jørgensen [off-list ref] wrote:quoted
John Fastabend [off-list ref] writes:quoted
Cong Wang wrote:quoted
On Tue, Aug 24, 2021 at 4:47 PM Martin KaFai Lau [off-list ref] wrote:quoted
Please explain more on this. What is currently missing to make qdisc in struct_ops possible?I think you misunderstand this point. The reason why I avoid it is _not_ anything is missing, quite oppositely, it is because it requires a lot of work to implement a Qdisc with struct_ops approach, literally all those struct Qdisc_ops (not to mention struct Qdisc_class_ops). WIth current approach, programmers only need to implement two eBPF programs (enqueue and dequeue). Thanks.Another idea. Rather than work with qdisc objects which creates all these issues with how to work with existing interfaces, filters, etc. Why not create an sk_buff map? Then this can be used from the existing egress/ingress hooks independent of the actual qdisc being used.I agree. In fact, I'm working on doing just this for XDP, and I see no reason why the map type couldn't be reused for skbs as well. Doing it this way has a couple of benefits:I do see a lot of reasons, for starters, struct skb_buff is very different from struct xdp_buff, any specialized map can not be reused. I guess you are using a generic one, how do you handle the refcnt at least for skb?
Well, you can't keep XDP frames and skbs in the same map instance, but you can create a map type that can be instantiated to hold either type and otherwise keep the same semantics. The map can just inc/dec the refcnt as skbs are added/removed from it.
quoted
- It leaves more flexibility to BPF: want a simple FIFO queue? just implement that with a single queue map. Or do you want to build a full hierarchical queueing structure? Just instantiate as many queue maps as you need to achieve this. Etc.Please give an example without a queue. ;) Queue is too simple, show us something more useful please. How do you plan to re-implement EDT with just queues?
I'm using 'queue' as a shorthand for any queueing/scheduling algorithm implementable by a qdisc. We need to cover them all, obviously, not just FIFO queues (in fact I think we should actively be discouraging those, but that's a different story :) ) For EDT it would be something like: - On enqueue, stick frames into the map with a rank corresponding to their transmission time (the map implements the PIFO queue, just like your patch). - (re-)arm a BPF timer to fire at the time of the next transmission event, and have that timer trigger interface TX. The first bit is straight-forward, and that last bit needs a new helper or something like it. For qdiscs I guess we could just expose qdisc_watchdog()? For XDP we'd need something new...
quoted
- The behaviour is defined entirely by BPF program behaviour, and does not require setting up a qdisc hierarchy in addition to writing BPF code.I have no idea why you call this a benefit, because my goal is to replace Qdisc's, not to replace any other things. You know there are plenty of Qdisc's which are not implemented in Linux kernel.
It's a benefit because it means you can keep everything together. I.e., you don't need to *both* write BPF code implementing your qdisc, *and* a setup script to build the qdisc hierarchy. That simplifies deployment. I suppose we could support inserting BPF qdiscs into a qdisc hierarchy as well if needed. I don't personally see much use for that, but if there's a use case, sure, why not?
quoted
- It should be possible to structure the hooks in a way that allows reusing queueing algorithm implementations between the qdisc and XDP layers.XDP has no skb but xdp_buff, no? And again, why only queues?
See above :) -Toke