Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc
From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: 2021-09-03 15:33:35
Also in:
bpf
On 2021-09-03 10:44 a.m., Toke Høiland-Jørgensen wrote:
Martin KaFai Lau [off-list ref] writes:quoted
On Fri, Sep 03, 2021 at 12:27:52AM +0200, Toke Høiland-Jørgensen wrote:quoted
quoted
quoted
The question is if it's useful to provide the full struct_ops for qdiscs? Having it would allow a BPF program to implement that interface towards userspace (things like statistics, classes etc), but the question is if anyone is going to bother with that given the wealth of BPF-specific introspection tools already available?Instead of bpftool can only introspect bpf qdisc and the existing tc can only introspect kernel qdisc, it will be nice to have bpf qdisc work as other qdisc and showing details together with others in tc. e.g. a bpf qdisc export its data/stats with its btf-id to tc and have tc print it out in a generic way?I'm not opposed to the idea, certainly. I just wonder if people who go to the trouble of writing a custom qdisc in BPF will feel it's worth it to do the extra work to make this available via a second API. We could certainly encourage it, and some things are easy (drop and pkt counters, etc), but other things (like class stats) will depend on the semantics of the qdisc being implemented, so will require extra work from the BPF qdisc developer...
The idea of using btf to overcome the domain difference is _very_ appealing but sounds like a lot of work? Havent delved enough into btf - but wondering if the same could be stated for filters and actions...Note: Aside from current existing tooling being well understood, challenges you will be faced with is reinventing all the infrastructure that tc qdiscs have taken care of over the years, example: the proper integrations with softirqs and multiprocessor protections, irqs, timers etc which take care of smooth triggering of enqueue/dequeue, taking care of defering things when the target device/hw is busy, hierarchies, etc, etc; not saying it is the most perfect or performant but it is one of those 'day 3' deployments i.e a lot of corner cases taken care of. I noticed you mentioned some of those things in one of your emails. For this reason - Cong's approach looks appealing because it reuses said infra. Main thing that needs to have extensibility is the de/enqueue ops as ebpf progs. Allowing enq/deq to be ebpf specific sounds like will allow one scheme that works for both tc and XDP (with enq/deq taking care of the buffer contextual differences). I admit XDP is a little harder than plain tc.... cheers, jamal