Re: [net-next V4 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: Daniel Borkmann <daniel@iogearbox.net>
Date: 2017-10-06 14:52:46
On 10/06/2017 12:50 PM, Jesper Dangaard Brouer wrote:
On Thu, 05 Oct 2017 11:40:15 +0200 Daniel Borkmann [off-list ref] wrote:quoted
On 10/04/2017 02:03 PM, Jesper Dangaard Brouer wrote: [...]quoted
+#define CPU_MAP_BULK_SIZE 8 /* 8 == one cacheline on 64-bit archs */ +struct xdp_bulk_queue { + void *q[CPU_MAP_BULK_SIZE]; + unsigned int count; +}; + +/* Struct for every remote "destination" CPU in map */ +struct bpf_cpu_map_entry { + u32 cpu; /* kthread CPU and map index */ + int map_id; /* Back reference to map */map_id is not used here if I read it correctly? We should then remove it.It is actually used in a later patch. Notice, there is no unused members in the final patch. I did consider adding back in the later patch, but it was annoying to during the devel and split-up patch phase, as it creates conflicts when I move between the different patches, that need to modify this struct. Thus, I choose to keep the end-struct in this cpumap-base-patch. If you insist, I can go though the patch-stack and carefully introduce changes to the struct in steps?
It would be great to have every patch as self-contained as possible since it otherwise makes reviewing much more time consuming to check through the other patches for possible usage patterns, I noticed you are using it for the tracepoints in patch 4/5 to dump map_id. Would definitely be good if you could avoid such split in future sets. [...]
quoted
quoted
+void __cpu_map_queue_destructor(void *ptr) +{ + /* For now, just catch this as an error */ + if (!ptr) + return; + pr_err("ERROR: %s() cpu_map queue was not empty\n", __func__);Can you elaborate on this "for now" condition? Is this a race when kthread doesn't consume queue on thread exit, or should it be impossible to trigger (should it then be replaced with a 'if (WARN_ON_ONCE(ptr)) page_frag_free(ptr)' and a more elaborate comment)?The "for now" is an old comment while developing and testing this. In this final state of the patchset it _should_ not be possible to trigger this situation. I like your idea of replacing it with a WARN_ON_ONCE. (as it might be good to keep in some form, as it would catch is someone changing the code which breaks the RCU+WQ+kthread tear-down procedure).
Ok. [...]
quoted
quoted
+ /* Updating qsize cause re-allocation of bpf_cpu_map_entry */ + rcpu = __cpu_map_entry_alloc(qsize, key_cpu, map->id); + if (!rcpu) + return -ENOMEM; + } + rcu_read_lock(); + __cpu_map_entry_replace(cmap, key_cpu, rcpu); + rcu_read_unlock(); + return 0;You need to update verifier such that this function cannot be called out of an BPF program,In the example BPF program, I do a lookup into the map, but only to verify that an entry exist (I don't look at the value). I would like to support such usage.
Ok, put comment below.
quoted
otherwise it would be possible under full RCU read context, which is explicitly avoided here and also it would otherwise be allowed for other maps of different type as well, which needs to be avoided.Sorry, I don't follow this.
What I meant is that check_map_func_compatibility() should check for BPF_MAP_TYPE_CPUMAP and only allow func_id of BPF_FUNC_redirect_map and BPF_FUNC_map_lookup_elem to be used, which I haven't seen the set restricting it to. Some of your later patches do this for the helper BPF_FUNC_redirect_map but the important point is that map updates wouldn't be done out of the BPF program itself, but rather from user space control path given they can't be done under full RCU read lock context if I read this correctly (which the programs run in though). [...]
quoted
quoted
+static void *cpu_map_lookup_elem(struct bpf_map *map, void *key) +{ + struct bpf_cpu_map_entry *rcpu = + __cpu_map_lookup_elem(map, *(u32 *)key); + + return rcpu ? &rcpu->qsize : NULL;The qsize doesn't seem used anywhere else besides here, but you probably should update verifier such that this cannot be called out of the BPF program, which could then mangle qsize value.It is true that the BPF prog can modify this qsize value, but it's not the authoritative value, so it doesn't really matter. As I said above, I do want to do a lookup from a BPF program. To allow to BPF program to know, if an entry is valid, else it will blindly send to a cpu destination. Maybe bpf_prog's just have to use a map-on-the-side to coordinate this(?), but then a sysadm modifying the real cpumap will be invisible to the program. Maybe we should just disable BPF-progs from reading this in the first iteration? It would allow for more advanced usage schemes later..
Okay, what you could do is the following to prevent unintended value updates. Just read out the qsize from the value and put that into a per-cpu scratch buffer that gets returned instead of the map value, so even if the scratch buffer gets mangled, it's okay because the actual map value doesn't. Verifier still checks the map value access bounds for that one.
One crazy idea is to have the cpu_map_lookup_elem() return if any packets are in-flight on this cpu-queue. (Making it easier to avoid OoO packets, when switching target CPU, but it can also be implemented by the BPF-programmer herself via maps, although via some extra atomic cost).quoted
quoted
+} + +static int cpu_map_get_next_key(struct bpf_map *map, void *key, void *next_key) +{ + struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map); + u32 index = key ? *(u32 *)key : U32_MAX; + u32 *next = next_key; + + if (index >= cmap->map.max_entries) { + *next = 0; + return 0; + } + + if (index == cmap->map.max_entries - 1) + return -ENOENT; + *next = index + 1; + return 0; +}I would have liked to have implemented next_key so it only returned the next valid cpu_entry, and used it as a simple round-robin scheduler. But AFAIK the next_key function is not allowed from bpf_prog's, right?
Hm, true we don't export this as a helper right now, I currently don't see a reason why we couldn't. For the array map, the get_next_key is probably not too useful given we only return key + 1. For user space it might help for iterating/dumping the map though. Probably one could enable it and add a flag to search the next valid entry from the map, though issue could well be that this might need to iterate/test millions of empty slots until you get to the next valid one which might not be suitable in a generic case to do out of a BPF prog, perhaps a second map with keys to round robin over might help. Cheers, Daniel