Thread (62 messages) 62 messages, 8 authors, 2014-07-05

Re: [PATCH RFC net-next 03/14] bpf: introduce syscall(BPF, ...) and BPF maps

From: Andy Lutomirski <hidden>
Date: 2014-07-04 15:17:49
Also in: linux-api, lkml

On Wed, Jul 2, 2014 at 7:29 PM, Alexei Starovoitov [off-list ref] wrote:
On Wed, Jul 2, 2014 at 6:43 PM, Andy Lutomirski [off-list ref] wrote:
quoted
On Tue, Jul 1, 2014 at 10:33 PM, Alexei Starovoitov [off-list ref] wrote:
quoted
I want to avoid string names, since they will force new 'strtab', 'symtab'
sections in the programs/maps and will uglify the user interface quite a bit.
To be fair, you really need to imitate ELF here.  A very simple
relocation-like table should do the trick.
simple.. right :) I do see the amount of struggle you have with
binutils and vdso.
I really don't want to add relocation unless this is last resort.
Especially since it can be solved without it.
I don't think I explained it enough in my last email… trying again:
quoted
quoted
Back in september one loadable unit was: one eBPF program + set of maps,
but tracing requirements forced a change, since multiple programs need
to access the same map and maps may need to be pre-populated before
the programs start executing, so I've split maps and programs into mostly
independent entities, but programs still need to think of maps as local:
For example I want to do a skb leak check 'tracing filter':
- attach this program to kretprobe of __alloc_skb():
  u64 key = (u64) skb;
  u64 value = bpf_get_time();
  bpf_update_map_elem(1/*const_map_id*/, &key, &value);
- attach this program to consume_skb and kfree_skb tracepoints:
  u64 key = (u64) skb;
  bpf_delete_map_elem(1/*const_map_id*/, &key);
- and have user space do:
  prior to loading:
  bpf_create_map(1/*map_id*/, 8/*key_size*/, 8/*value*/, 1M /*max_entries*/)
  and then periodically iterate the map to see whether any skb stayed
  in the map for too long.

Programs need to be written with hard coded map_ids otherwise usability
suffers, so I did global 32-bit id in this RFC
, but this indeed doesn't work
Really?  That will mean that you have to edit the source of your
filter program if the small integer map number you chose conflicts
with another program.  That sounds unpleasant.
unpleasant. exactly. that's why I'm proposing per-process local map-id,
so that programs don't need to be edited.
quoted
quoted
for unprivileged chrome browser unless programs are previously loaded
by root and chrome only does attach to seccomp.

So here is the non-root bpf syscall interface I'm thinking about:

ufd = bpf_create_map(map_id, key_size, value_size, max_entries);

it will create a global map in the system which will be accessible
in this process via 'ufd'. Internally this 'ufd' will be assigned global map_id
and process-local map_id that was passed as a 1st argument.
To do update/lookup the process will use bpf_map_xxx_elem(ufd,…)
Erk.  Unprivileged programs shouldn't be able to allocate global ids
of their choosing, especially if privileged programs can also do it.
Otherwise unprivileged programs can force a collision and possibly
steal information.
of course. that's not what said.
quoted
quoted
Then to load eBPF program the process will do:
ufd = bpf_prog_load(prog_type, ebpf_insn_array, license)
and instructions will be referring to maps via local map_id that
was hard coded as part of the program.
I think relocations would be must prettier than per-process map id tables.
I think per process map_id are much cleaner.

non-root API:

ufd = bpf_create_map(local_map_id,… )
bpf_map_update/delete/lookup_elem(ufd,…)
ufd = bpf_prog_load(insns)
close(ufd)

root only API:

global_id = bpf_get_id(ufd) // returns either map or prog global id
bpf_map_delete(global_map_id)
bpf_prog_unload(global_prog_id)

Details:

ufd = bpf_create_map(local_map_id, ...);

local_map_id - process local map_id
(this id is used to access maps from eBPF program loaded by this process)

ufd - process local file descriptor
(used to update/lookup maps from this process)

global_map_id = bpf_get_id(ufd)
this is root only call to get global_ids and pass them to global events
like tracing.

global ids will only be seen by root. There is no way for root or non-root
to influence id ranges.
quoted
quoted
Beyond the normal create_map, update/lookup/delete, load_prog
operations (that are accessible to both root and non-root), the root user
gains one more operations: bpf_get_global_id(ufd) that returns
global map_id or prog_id. This id can be attached to global events
like tracing. Non-root users lose ability to do delete_map and
unload_prog (they do close(ufd) instead), so this ops are for root
only and operate on global ids.
This is the cleanest way I could think of to combine non-root
security, per-process id and global id all in one API. Thoughts?
I think I'm okay with this part, although an interface to get a map fd
given some reference to the program (in sysfs) that uses it would also
work and maybe be more straightforward.
If you meant debugfs, then yes. I'm planning to add a way for root
to see all loaded programs and maps (similar to /proc as lsmod does),
and then do bpf_map_delete/bpf_prog_unload (similar to rmmod)

setsockopt and seccomp will be non-root and programs will go
through additional dont_leak_pointers check in verifier.
tracing/dtrace will be for root only, since they would need to attach
to global events.

I think it will be cleaner once I finish fd conversion as a patch.
OK

FWIW, per-process local id maps sound almost equivalent to relocations
-- the latter could be as simple as an extra nlattr giving a list of
pairs of (per-eBPF-program id, fd).

My current binutils mess is mainly just because I'm trying to do
something weird with an old, old file format that needs to support
lots of legacy tools.  You won't have that problem.

--Andy
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help