Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov <hidden>
Date: 2019-08-28 04:43:47
Also in:
bpf, linux-api, netdev
On Tue, Aug 27, 2019 at 05:55:41PM -0700, Andy Lutomirski wrote:
I was hoping for something in Documentation/admin-guide, not in a changelog that's hard to find.
eventually yes.
quoted
quoted
Changing the capability that some existing operation requires could break existing programs. The old capability may need to be accepted as well.As far as I can see there is no ABI breakage. Please point out which line of the patch may break it.As a more or less arbitrary selection: void bpf_prog_kallsyms_add(struct bpf_prog *fp) { if (!bpf_prog_kallsyms_candidate(fp) || - !capable(CAP_SYS_ADMIN)) + !capable(CAP_BPF)) return; Before your patch, a task with CAP_SYS_ADMIN could do this. Now it can't. Per the usual Linux definition of "ABI break", this is an ABI break if and only if someone actually did this in a context where they have CAP_SYS_ADMIN but not all capabilities. How confident are you that no one does things like this? void bpf_prog_kallsyms_add(struct bpf_prog *fp) { if (!bpf_prog_kallsyms_candidate(fp) || - !capable(CAP_SYS_ADMIN)) + !capable(CAP_BPF)) return;
Yes. I'm confident that apps don't drop everything and leave cap_sys_admin only before doing bpf() syscall, since it would break their own use of networking. Hence I'm not going to do the cap_syslog-like "deprecated" message mess because of this unfounded concern. If I turn out to be wrong we will add this "deprecated mess" later.
From the previous discussion, you want to make progress toward solving a lot of problems with CAP_BPF. One of them was making BPF firewalling more generally useful. By making CAP_BPF grant the ability to read kernel memory, you will make administrators much more nervous to grant CAP_BPF.
Andy, were your email hacked? I explained several times that in this proposal CAP_BPF _and_ CAP_TRACING _both_ are necessary to read kernel memory. CAP_BPF alone is _not enough_.
Similarly, and correct me if I'm wrong, most of these capabilities are primarily or only useful for tracing, so I don't see why users without CAP_TRACING should get them. bpf_trace_printk(), in particular, even has "trace" in its name :) Also, if a task has CAP_TRACING, it's expected to be able to trace the system -- that's the whole point. Why shouldn't it be able to use BPF to trace the system better?
CAP_TRACING shouldn't be able to do BPF because BPF is not tracing only.
quoted
For example: BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr) { int ret; ret = probe_kernel_read(dst, unsafe_ptr, size); if (unlikely(ret < 0)) memset(dst, 0, size); return ret; } All of BPF (including prototype of bpf_probe_read) is controlled by CAP_BPF. But the kernel primitives its using (probe_kernel_read) is controlled by CAP_TRACING. Hence a task needs _both_ CAP_BPF and CAP_TRACING to attach and run bpf program that uses bpf_probe_read. Similar with all other kernel code that BPF helpers may call directly or indirectly. If there is a way for bpf program to call into piece of code controlled by CAP_TRACING such helper would need CAP_BPF and CAP_TRACING. If bpf helper calls into something that may mangle networking packet such helper would need both CAP_BPF and CAP_NET_ADMIN to execute.Why do you want to require CAP_BPF to call into functions like bpf_probe_read()? I understand why you want to limit access to bpf, but I think that CAP_TRACING should be sufficient to allow the tracing parts of BPF. After all, a lot of your concerns, especially the ones involving speculation, don't really apply to users with CAP_TRACING -- users with CAP_TRACING can read kernel memory with or without bpf.
Let me try again to explain the concept... Imagine AUDI logo with 4 circles. They partially intersect. The first circle is CAP_TRACING. Second is CAP_BPF. Third is CAP_NET_ADMIN. Fourth - up to your imagination :) These capabilities subdivide different parts of root privileges. CAP_NET_ADMIN is useful on its own. Just as CAP_TRACING that is useful for perf, ftrace, and probably other tracing things that don't need bpf. 'bpftrace' is using a lot of tracing and a lot of bpf features, but not all of bpf and not all tracing. It falls into intersection of CAP_BPF and CAP_TRACING. probe_kernel_read is a tracing mechanism. perf can use it without bpf. Hence it should be controlled by CAP_TRACING. bpf_probe_read is a wrapper of that mechanism. It's a place where BPF and TRACING circles intersect. A task needs to have both CAP_BPF (to load the program) and CAP_TRACING (to read kernel memory) to execute bpf_probe_read() helper.
quoted
quoted
quoted
@@ -2080,7 +2083,10 @@ static int bpf_prog_test_run(const union bpf_attr *attr, struct bpf_prog *prog; int ret = -ENOTSUPP; - if (!capable(CAP_SYS_ADMIN)) + if (!capable(CAP_NET_ADMIN) || !capable(CAP_BPF)) + /* test_run callback is available for networking progs only. + * Add cap_bpf_tracing() above when tracing progs become runable. + */I think test_run should probably be CAP_SYS_ADMIN forever. test_run is the only way that one can run a bpf program and call helper functions via the program if one doesn't have permission to attach the program.Since CAP_BPF + CAP_NET_ADMIN allow attach. It means that a task with these two permissions will have programs running anyway. (traffic will flow through netdev, socket events will happen, etc) Hence no reason to disallow running program via test_run.test_run allows fully controlled inputs, in a context where a program can trivially flush caches, mistrain branch predictors, etc first. It seems to me that, if a JITted bpf program contains an exploitable speculation gadget (MDS, Spectre v1, RSB, or anything else),
speaking of MDS... I already asked you to help investigate its applicability with existing bpf exposure. Are you going to do that?
it will be *much* easier to exploit it using test_run than using normal network traffic. Similarly, normal network traffic will have network headers that are valid enough to have caused the BPF program to be invoked in the first place. test_run can inject arbitrary garbage.
Please take a look at Jann's var1 exploit. Was it hard to run bpf prog in controlled environment without test_run command ?