Thread (75 messages) 75 messages, 11 authors, 2019-08-27

Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf

From: Andy Lutomirski <luto@amacapital.net>
Date: 2019-07-24 01:40:33
Also in: bpf, netdev

Possibly related (same subject, not in this thread)

On Jul 23, 2019, at 3:56 PM, Song Liu [off-list ref] wrote:


quoted
On Jul 23, 2019, at 8:11 AM, Andy Lutomirski [off-list ref] wrote:

On Mon, Jul 22, 2019 at 1:54 PM Song Liu [off-list ref] wrote:
quoted
Hi Andy, Lorenz, and all,
quoted
On Jul 2, 2019, at 2:32 PM, Andy Lutomirski [off-list ref] wrote:

On Tue, Jul 2, 2019 at 2:04 PM Kees Cook [off-list ref] wrote:
quoted
quoted
On Mon, Jul 01, 2019 at 06:59:13PM -0700, Andy Lutomirski wrote:
I think I'm understanding your motivation.  You're not trying to make
bpf() generically usable without privilege -- you're trying to create
a way to allow certain users to access dangerous bpf functionality
within some limits.

That's a perfectly fine goal, but I think you're reinventing the
wheel, and the wheel you're reinventing is quite complicated and
already exists.  I think you should teach bpftool to be secure when
installed setuid root or with fscaps enabled and put your policy in
bpftool.  If you want to harden this a little bit, it would seem
entirely reasonable to add a new CAP_BPF_ADMIN and change some, but
not all, of the capable() checks to check CAP_BPF_ADMIN instead of the
capabilities that they currently check.
If finer grained controls are wanted, it does seem like the /dev/bpf
path makes the most sense. open, request abilities, use fd. The open can
be mediated by DAC and LSM. The request can be mediated by LSM. This
provides a way to add policy at the LSM level and at the tool level.
(i.e. For tool-level controls: leave LSM wide open, make /dev/bpf owned
by "bpfadmin" and bpftool becomes setuid "bpfadmin". For fine-grained
controls, leave /dev/bpf wide open and add policy to SELinux, etc.)

With only a new CAP, you don't get the fine-grained controls. (The
"request abilities" part is the key there.)
Sure you do: the effective set.  It has somewhat bizarre defaults, but
I don't think that's a real problem.  Also, this wouldn't be like
CAP_DAC_READ_SEARCH -- you can't accidentally use your BPF caps.

I think that a /dev capability-like object isn't totally nuts, but I
think we should do it well, and this patch doesn't really achieve
that.  But I don't think bpf wants fine-grained controls like this at
all -- as I pointed upthread, a fine-grained solution really wants
different treatment for the different capable() checks, and a bunch of
them won't resemble capabilities or /dev/bpf at all.
With 5.3-rc1 out, I am back on this. :)

How about we modify the set as:
1. Introduce sys_bpf_with_cap() that takes fd of /dev/bpf.
I'm fine with this in principle, but:
quoted
2. Better handling of capable() calls through bpf code. I guess the
   biggest problem here is is_priv in verifier.c:bpf_check().
I think it would be good to understand exactly what /dev/bpf will
enable one to do.  Without some care, it would just become the next
CAP_SYS_ADMIN: if you can open it, sure, you're not root, but you can
intercept network traffic, modify cgroup behavior, and do plenty of
other things, any of which can probably be used to completely take
over the system.
Well, yes. sys_bpf() is pretty powerful. 

The goal of /dev/bpf is to enable special users to call sys_bpf(). In 
the meanwhile, such users should not take down the whole system easily
by accident, e.g., with rm -rf /.
That’s easy, though — bpftool could learn to read /etc/bpfusers before allowing ruid != 0.
It is similar to CAP_BPF_ADMIN, without really adding the CAP_.  

I think adding new CAP_ requires much more effort. 
A new CAP_ is straightforward — add the definition and change the max cap.
quoted
It would also be nice to understand why you can't do what you need to
do entirely in user code using setuid or fscaps.
It is not very easy to achieve the same control: only certain users can
run certain tools (bpftool, etc.). 

The closest approach I can find is:
 1. use libcap (pam_cap) to give CAP_SETUID to certain users;
 2. add setuid(0) to bpftool.

The difference between this approach and /dev/bpf is that certain users
would be able to run other tools that call setuid(). Though I am not 
sure how many tools call setuid(), and how risky they are. 
I think you’re misunderstanding me. Install bpftool with either the setuid (S_ISUID) mode or with an appropriate fscap bit — see the setcap(8) manpage.

The downside of this approach is that it won’t work well in a container, and containers are cool these days :)
quoted
Finally, at risk of rehashing some old arguments, I'll point out that
the bpf() syscall is an unusual design to begin with.  As an example,
consider bpf_prog_attach().  Outside of bpf(), if I want to change the
behavior of a cgroup, I would write to a file in
/sys/kernel/cgroup/unified/whatever/, and normal DAC and MAC rules
apply.  With bpf(), however, I just call bpf() to attach a program to
the cgroup.  bpf() says "oh, you are capable(CAP_NET_ADMIN) -- go for
it!".  Unless I missed something major, and I just re-read the code,
there is no check that the caller has write or LSM permission to
anything at all in cgroupfs, and the existing API would make it very
awkward to impose any kind of DAC rules here.

So I think it might actually be time to repay some techincal debt and
come up with a real fix.  As a less intrusive approach, you could see
about requiring ownership of the cgroup directory instead of
CAP_NET_ADMIN.  As a more intrusive but perhaps better approach, you
could invert the logic to to make it work like everything outside of
cgroup: add pseudo-files like bpf.inet_ingress to the cgroup
directories, and require a writable fd to *that* to a new improved
attach API.  If a user could do:

int fd = open("/sys/fs/cgroup/.../bpf.inet_attach", O_RDWR);  /* usual
DAC and MAC policy applies */
int bpf_fd = setup the bpf stuff;  /* no privilege required, unless
the program is huge or needs is_priv */
bpf(BPF_IMPROVED_ATTACH, target = fd, program = bpf_fd);

there would be no capabilities or global privilege at all required for
this.  It would just work with cgroup delegation, containers, etc.

I think you could even pull off this type of API change with only
libbpf changes.  In particular, there's this code:

int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type,
                  unsigned int flags)
{
      union bpf_attr attr;

      memset(&attr, 0, sizeof(attr));
      attr.target_fd     = target_fd;
      attr.attach_bpf_fd = prog_fd;
      attr.attach_type   = type;
      attr.attach_flags  = flags;

      return sys_bpf(BPF_PROG_ATTACH, &attr, sizeof(attr));
}

This would instead do something like:

int specific_target_fd = openat(target_fd, bpf_type_to_target[type], O_RDWR);
attr.target_fd = specific_target_fd;
...

return sys_bpf(BPF_PROG_IMPROVED_ATTACH, &attr, sizeof(attr));

Would this solve your problem without needing /dev/bpf at all?
This gives fine grain access control. I think it solves the problem. 
But it also requires a lot of rework to sys_bpf(). And it may also 
break backward/forward compatibility?
I think the compatibility issue is manageable. The current bpf() interface would be supported for at least several years, and libbpf could detect that the new interface isn’t supported and fall back the old interface
Personally, I think it is an overkill for the original motivation: 
call sys_bpf() with special user instead of root. 
It’s overkill for your specific use case, but I’m trying to encourage you to either solve your problem entirely in userspace or to solve a more general problem in the kernel :)

In furtherance of bpf’s goal of world domination, I think it would be great if it Just Worked in a container. My proposal does this.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help