Thread (26 messages) 26 messages, 6 authors, 2021-09-28

Re: Redux: Backwards compatibility for XDP multi-buff

From: Toke Høiland-Jørgensen <hidden>
Date: 2021-09-23 18:45:35
Also in: bpf

Zvi Effron [off-list ref] writes:
On Wed, Sep 22, 2021 at 1:01 PM Toke Høiland-Jørgensen [off-list ref] wrote:
quoted
Jakub Kicinski [off-list ref] writes:
quoted
On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote:
quoted
quoted
quoted
Neither of those are desirable outcomes, I think; and if we add a
separate "XDP multi-buff" switch, we might as well make it system-wide?
If we have an internal flag 'this driver supports multi-buf xdp' cannot we
make xdp_redirect to linearize in case the packet is being redirected
to non multi-buf aware driver (potentially with corresponding non mb aware xdp
progs attached) from mb aware driver?
Hmm, the assumption that XDP frames take up at most one page has been
fundamental from the start of XDP. So what does linearise mean in this
context? If we get a 9k packet, should we dynamically allocate a
multi-page chunk of contiguous memory and copy the frame into that, or
were you thinking something else?
My $.02 would be to not care about redirect at all.

It's not like the user experience with redirect is anywhere close
to amazing right now. Besides (with the exception of SW devices which
will likely gain mb support quickly) mixed-HW setups are very rare.
If the source of the redirect supports mb so will likely the target.
It's not about device support it's about XDP program support: If I run
an MB-aware XDP program on a physical interface and redirect the (MB)
frame into a container, and there's an XDP program running inside that
container that isn't MB-aware, bugs will ensue. Doesn't matter if the
veth driver itself supports MB...

We could leave that as a "don't do that, then" kind of thing, but that
was what we were proposing (as the "do nothing" option) and got some
pushback on, hence why we're having this conversation :)

-Toke
I hadn't even considered the case of redirecting to a veth pair on the same
system. I'm assuming from your statement that the buffers are passed directly
to the ingress inside the container and don't go through the sort of egress
process they would if leaving the system? And I'm assuming that's as an
optimization?
Yeah, if we redirect an XDP frame to a veth, the peer will get the same
xdp_frame, without ever building an SKB.
I'm not sure that makes a difference, though. It's not about whether the
driver's code is mb-capable, it's about whether the driver _as currently
configured_ could generate multiple buffers. If it can, then only an mb-aware
program should be able to be attached to it (and tail called from whatever's
attached to it). If it can't, then there should be no way to have multiple
buffers come to it.

So in the situation you've described, either the veth driver should be in a
state where it coalesces the multiple buffers into one, fragmenting the frame
if necessary or drops the frame, or the program attached inside the container
would need to be mb-aware. I'm assuming with the veth driver as written, this
might mean that all programs attached to the veth driver would need to be
mb-aware, which is obviously undesirable.
Hmm, I guess that as long as mb-frames only show up for large MTUs, the
MTU of the veth device would be a limiting factor just like for physical
devices, so we could just apply the same logic there. Not sure why I
didn't consider that before :/
All of which significantly adds to the complexity to support mb-aware, so maybe
this could be developed later? Initially we could have a sysctl toggling the
state 0 single-buffer only, 1 multibuffer allowed. Then later we _could_ add a
state for dynamic control once all XDP supporting drivers support the necessary
dynamic functionality (if ever). At that point we'd have actual experience with
the sysctl and could see how much of a burden having static control is.

I may have been misinterpreting your use case though, and you were talking
about the XDP program running on the egress side of the redirect? Is that what
you were talking about case?
No I was talking about exactly what you outlined above. Although longer
term, I also think we can use XDP mb as a way to avoid having to
linearise SKBs when running XDP on them in veth (and for generic XDP) :)

-Toke
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help