Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski <luto@kernel.org>
Date: 2019-08-13 23:06:15
Also in:
bpf, linux-security-module, netdev
Possibly related (same subject, not in this thread)
- 2019-06-28 · Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf · Christian Brauner <christian@brauner.io>
- 2019-06-27 · Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf · Andy Lutomirski <luto@kernel.org>
On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov [off-list ref] wrote:
On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:quoted
quoted
Inside containers and inside nested containers we need to start processes that will use bpf. All of the processes are trusted.Trusted by whom? In a non-nested container, the container manager *might* be trusted by the outside world. In a *nested* container, unless the inner container management is controlled from outside the outer container, it's not trusted. I don't know much about how Facebook's containers work, but the LXC/LXD/Podman world is moving very strongly toward user namespaces and maximally-untrusted containers, and I think bpf() should work in that context.agree that containers (namespaces) reduce amount of trust necessary for apps to run, but the end goal is not security though. Linux has become a single user system. If user can ssh into the host they can become root. If arbitrary code can run on the host it will be break out of any sandbox.
I would argue that this is a reasonable assumption to make if you're designing a system using Linux, but it's not a valid assumption to make as kernel developers. Otherwise we should just give everyone CAP_SYS_ADMIN and call it a day. There really is a difference between root and non-root.
Containers are not providing the level of security that is enough to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy. Containers are used to make production systems safer. Some people call it more 'secure', but it's clearly not secure for arbitrary code and that is what kernel.unprivileged_bpf_disabled allows. When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program. It's been a constant source of pain. The constant blinding, randomization, verifier speculative analysis, all spectre v1, v2, v4 mitigations are simply not worth it. It's a lot of complex kernel code without users.
Seccomp really will want eBPF some day, and it should work without privilege. Maybe it should be a restricted subset of eBPF, and Spectre will always be an issue until dramatically better hardware shows up, but I think people will want the ability for regular programs to load eBPF seccomp programs.
Hence I prefer this /dev/bpf mechanism to be as simple a possible. The applications that will use it are going to be just as trusted as systemd.
I still don't understand your systemd example. systemd --users is not trusted systemwide in any respect. The main PID 1 systemd is root. No matter how you dice it, granting a user systemd instance extra bpf access is tantamount to granting the user extra bpf access in general. It sounds to me like you're thinking of eBPF as a feature a bit like unprivileged user namespaces: *in principle*, it's supposed to be safe to give any unprivileged process the ability to use it, and you consider security flaws in it to be bugs worth fixing. But you think it's a large attack surface and that most unprivileged programs shouldn't be allowed to use it. Is that reasonable?
quoted
quoted
To solve your concern of bypassing all capable checks... How about we do /dev/bpf/full_verifier first? It will replace capable() checks in the verifier only.I'm not convinced that "in the verifier" is the right distinction. Telling administrators that some setting lets certain users bypass bpf() verifier checks doesn't have a clear enough meaning.linux is a single user system. there are no administrators any more. No doubt, folks will disagree, but that game is over. At least on bpf side it's done.quoted
I propose, instead, that the current capable() checks be divided into three categories:I don't see a use case for these categories. All bpf programs extend the kernel in some way. The kernel vs user is one category. Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN. When application has CAP_NET_ADMIN it covers all of networking knobs. There is no use case that would warrant fine grain CAP_ROUTE_ADMIN, CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc. Similarly CAP_BPF as the only knob is enough. The only disadvantage of CAP_BPF is that it's not possible to pass it from one systemd-like daemon to another systemd-like daemon. Hence /dev/bpf idea and passing file descriptor.quoted
This type of thing actually fits quite nicely into an idea I've been thinking about for a while called "implicit rights". In very brief summary, there would be objects called /dev/rights/xyz, where xyz is the same of a "right". If there is a readable object of the right type at the literal path "/dev/rights/xyz", then you have right xyz. There's a bit more flexibility on top of this. BPF could use /dev/rights/bpf/maptypes/lpm and /dev/rights/bpf/verifier/bounded_loops, for example. Other non-BPF use cases include a biggie: /dev/rights/namespace/create_unprivileged_userns. /dev/rights/bind_port/80 would be nice, too.The concept of "implicit rights" is very nice and I'm sure it will be a good fit somewhere, but I don't see why use it in bpf space. There is no use case for fine grain partition of bpf features.