Thread (62 messages) 62 messages, 8 authors, 2014-07-05

Re: [PATCH RFC net-next 03/14] bpf: introduce syscall(BPF, ...) and BPF maps

From: Andy Lutomirski <hidden>
Date: 2014-06-28 15:34:39
Also in: lkml, netdev

On Fri, Jun 27, 2014 at 11:43 PM, Alexei Starovoitov [off-list ref] wrote:
On Fri, Jun 27, 2014 at 11:25 PM, Andy Lutomirski [off-list ref] wrote:
quoted
On Fri, Jun 27, 2014 at 10:55 PM, Alexei Starovoitov [off-list ref] wrote:
quoted
On Fri, Jun 27, 2014 at 5:16 PM, Andy Lutomirski [off-list ref] wrote:
quoted
On Fri, Jun 27, 2014 at 5:05 PM, Alexei Starovoitov [off-list ref] wrote:
quoted
BPF syscall is a demux for different BPF releated commands.

'maps' is a generic storage of different types for sharing data between kernel
and userspace.

The maps can be created/deleted from user space via BPF syscall:
- create a map with given id, type and attributes
  map_id = bpf_map_create(int map_id, map_type, struct nlattr *attr, int len)
  returns positive map id or negative error

- delete map with given map id
  err = bpf_map_delete(int map_id)
  returns zero or negative error
What's the scope of "id"?  How is it secured?
the map and program id space is global and it's cap_sys_admin only.
There is no pressing need to do it with per-user limits.
So the whole thing is root only for now.
Hmm.  This may be unpleasant if you ever want to support non-root or
namespaced operation.
I think it will be easy to extend it per namespace when we lift
root-only restriction. It will be seamless without user api changes.
It might be seamless, but I'm not sure it'll be very useful.  See below.
quoted
How hard would it be to give these things fds?
you mean programs/maps auto-terminate when creator process
exits? I thought about it and it's appealing at first glance, but
doesn't fit the model of existing tracepoint events which are global.
The programs attached to events need to live without 'daemon'
hanging around. Therefore I picked 'kernel module'- like method.
Here are some things I'd like to be able to do:

 - Load an eBPF program and use it as a seccomp filter.

 - Create a read-only map and reference it from a seccomp filter.

 - Create a data structure that a seccomp filter can write but that
the filtered process can only read.

 - Create a data structure that a seccomp filter can read but that
some other trusted process can write.

 - Create a network filter of some sort and give permission to
manipulate a list of ports to an otherwise untrusted process.

The first four of these shouldn't require privilege.

All of this fits nicely into a model where all of the eBPF objects
(filters and data structures) are represented by fds.  Read access to
the fd lets you read (or execute eBPF programs).  Write access to the
fd lets you write.  You can send them around naturally using
SCM_RIGHTS, and you can create deprivileged versions by reopening the
objects with less access.

All of this *could* fit in using global ids, but we'd need to answer
questions like "what namespace are they bound to" and "who has access
to a given fd".  I'd want to see that these questions *have* good
answers before committing to this type of model.  Keep in mind that,
for seccomp in particular, granting access to a specific uid will be
very limiting: part of the point of seccomp is to enable
user-controlled finer-grained permissions than allowed by uids and
gids.

--Andy
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help