Re: [PATCH v2 bpf-next 00/18] BPF token
From: Djalal Harouni <hidden>
Date: 2023-06-09 22:30:12
Also in:
bpf
Hi Andrii, On Thu, Jun 8, 2023 at 1:54 AM Andrii Nakryiko [off-list ref] wrote:
This patch set introduces new BPF object, BPF token, which allows to delegate a subset of BPF functionality from privileged system-wide daemon (e.g., systemd or any other container manager) to a *trusted* unprivileged application. Trust is the key here. This functionality is not about allowing unconditional unprivileged BPF usage. Establishing trust, though, is completely up to the discretion of respective privileged application that would create a BPF token. The main motivation for BPF token is a desire to enable containerized BPF applications to be used together with user namespaces. This is currently impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read arbitrary memory, and it's impossible to ensure that they only read memory of processes belonging to any given namespace. This means that it's impossible to have namespace-aware CAP_BPF capability, and as such another mechanism to allow safe usage of BPF functionality is necessary. BPF token and delegation of it to a trusted unprivileged applications is such mechanism. Kernel makes no assumption about what "trusted" constitutes in any particular case, and it's up to specific privileged applications and their surrounding infrastructure to decide that. What kernel provides is a set of APIs to create and tune BPF token, and pass it around to privileged BPF commands that are creating new BPF objects like BPF programs, BPF maps, etc.
Is there a reason for coupling this only with the userns? The "trusted unprivileged" assumed by systemd can be in init userns?
Previous attempt at addressing this very same problem ([0]) attempted to utilize authoritative LSM approach, but was conclusively rejected by upstream LSM maintainers. BPF token concept is not changing anything about LSM approach, but can be combined with LSM hooks for very fine-grained security policy. Some ideas about making BPF token more convenient to use with LSM (in particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF 2023 presentation ([1]). E.g., an ability to specify user-provided data (context), which in combination with BPF LSM would allow implementing a very dynamic and fine-granular custom security policies on top of BPF token. In the interest of minimizing API surface area discussions this is going to be added in follow up patches, as it's not essential to the fundamental concept of delegatable BPF token. It should be noted that BPF token is conceptually quite similar to the idea of /dev/bpf device file, proposed by Song a while ago ([2]). The biggest difference is the idea of using virtual anon_inode file to hold BPF token and allowing multiple independent instances of them, each with its own set of restrictions. BPF pinning solves the problem of exposing such BPF token through file system (BPF FS, in this case) for cases where transferring FDs over Unix domain sockets is not convenient. And also, crucially, BPF token approach is not using any special stateful task-scoped flags. Instead, bpf()
What's the use case for transfering over unix domain sockets? Will BPF token translation happen if you cross the different namespaces? If the token is pinned into different bpffs, will the token share the same context?