Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

From: Song Liu <song@kernel.org>
Date: 2023-11-18 16:08:52
Also in: bpf, kvm, linux-doc, linux-kselftest, lkml, virtualization

Hi,

A few rookie questions below.

On Sat, Nov 18, 2023 at 2:39 AM Akihiko Odaki [off-list ref] wrote:

On 2023/10/18 4:19, Akihiko Odaki wrote:

quoted

On 2023/10/18 4:03, Alexei Starovoitov wrote:

[...]

quoted

I would also appreciate if you have some documentation or link to
relevant discussions on the mailing list. That will avoid having same
discussion you may already have done in the past.

Hi,

The discussion has been stuck for a month, but I'd still like to
continue figuring out the way best for the whole kernel to implement
this feature. I summarize the current situation and question that needs
to be answered before push this forward:

The goal of this RFC is to allow to report hash values calculated with
eBPF steering program. It's essentially just to report 4 bytes from the
kernel to the userspace.

AFAICT, the proposed design is to have BPF generate some data
(namely hash, but could be anything afaict) and consume it from
user space. Instead of updating __sk_buff, can we have the user
space to fetch the data/hash from a bpf map? If this is an option,
I guess we can implement the same feature with BPF tracing
programs?

Unfortunately, however, it is not acceptable for the BPF subsystem
because the "stable" BPF is completely fixed these days. The
"unstable/kfunc" BPF is an alternative, but the eBPF program will be
shipped with a portable userspace program (QEMU)[1] so the lack of
interface stability is not tolerable.

bpf kfuncs are as stable as exported symbols. Is exported symbols
like stability enough for the use case? (I would assume yes.)

Another option is to hardcode the algorithm that was conventionally
implemented with eBPF steering program in the kernel[2]. It is possible
because the algorithm strictly follows the virtio-net specification[3].
However, there are proposals to add different algorithms to the
specification[4], and hardcoding the algorithm to the kernel will
require to add more UAPIs and code each time such a specification change
happens, which is not good for tuntap.

The requirement looks similar to hid-bpf. Could you explain why that
model is not enough? HID also requires some stability AFAICT.

Thanks,
Song

In short, the proposed feature requires to make either of three compromises:

1. Compromise on the BPF side: Relax the "stable" BPF feature freeze
once and allow eBPF steering program to report 4 more bytes to the kernel.

2. Compromise on the tuntap side: Implement the algorithm to the kernel,
and abandon the capability to update the algorithm without changing the
kernel.

IMHO, I think it's better to make a compromise on the BPF side (option
1). We should minimize the total UAPI changes in the whole kernel, and
option 1 is much superior in that sense.

Yet I have to note that such a compromise on the BPF side can risk the
"stable" BPF feature freeze fragile and let other people complain like
"you allowed to change stable BPF for this, why do you reject [some
other request to change stable BPF]?" It is bad for BPF maintainers. (I
can imagine that introducing and maintaining widely different BPF
interfaces is too much burden.) And, of course, this requires an
approval from BPF maintainers.

So I'd like to ask you that which of these compromises you think worse.
Please also tell me if you have another idea.

Regards,
Akihiko Odaki

[1] https://qemu.readthedocs.io/en/v8.1.0/devel/ebpf_rss.html
[2]
https://lore.kernel.org/all/20231008052101.144422-1-akihiko.odaki@daynix.com/ (local)
[3]
https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-2400003
[4]
https://lore.kernel.org/all/CACGkMEuBbGKssxNv5AfpaPpWQfk2BHR83rM5AHXN-YVMf2NvpQ@mail.gmail.com/ (local)

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help