Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect.
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: 2024-01-18 07:35:43
Also in:
bpf, lkml
On 2024-01-17 17:37:29 [+0100], Toke Høiland-Jørgensen wrote:
This is all back-of-the-envelope calculations, of course. Having some actual numbers to look at would be great; I don't suppose you have a setup where you can run xdp-bench and see how your patches affect the throughput?
No but I probably could set it up.
I chatted with Jesper about this, and he had an idea not too far from
this: split up the XDP and regular stack processing in two stages, each
with their individual batching. So whereas right now we're doing
something like:
run_napi()
bh_disable()
for pkt in budget:
act = run_xdp(pkt)
if (act == XDP_PASS)
run_netstack(pkt) // this is the expensive bit
bh_enable()
We could instead do:
run_napi()
bh_disable()
for pkt in budget:
act = run_xdp(pkt)
if (act == XDP_PASS)
add_to_list(pkt, to_stack_list)
bh_enable()
// sched point
bh_disable()
for pkt in to_stack_list:
run_netstack(pkt)
bh_enable()
This would limit the batching that blocks everything to only the XDP
processing itself, which should limit the maximum time spent in the
blocking state significantly compared to what we have today. The caveat
being that rearranging things like this is potentially a pretty major
refactoring task that needs to touch all the drivers (even if some of
the logic can be moved into the core code in the process). So not really
sure if this approach is feasible, TBH.This does not work because bh_disable() does not disable scheduling. Scheduling may happen. bh_disable() acquires a lock which is currently the only synchronisation point between two say network driver doing NAPI. And this what I want to get rid of. Regarding expensive bit as in XDP_PASS: This doesn't need locking as per proposal, just the REDIRECT piece.
quoted
Daniel said netkit doesn't need this locking because it is not supporting this redirect and it made me think. Would it work to make the redirect structures part of the bpf_prog-structure instead of per-CPU? My understanding is that eBPF's programs data structures are part of it and contain locking allowing one eBPF program preempt another one. Having the redirect structures part of the program would obsolete locking. Do I miss anything?This won't work, unfortunately: the same XDP program can be attached to multiple interfaces simultaneously, and for hardware with multiple receive queues (which is most of the hardware that supports XDP), it can even run simultaneously on multiple CPUs on the same interface. This is the reason why this is all being kept in per-CPU variables today.
So I started hacking this and noticed yesterday and noticed that you can run multiple bpf programs. This is how I learned that it won't work. My plan B is now to move it into task_struct.
-Toke
Sebastian