Re: [net-next V4 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP

From: Daniel Borkmann <daniel@iogearbox.net>
Date: 2017-10-06 14:52:46

On 10/06/2017 12:50 PM, Jesper Dangaard Brouer wrote:

On Thu, 05 Oct 2017 11:40:15 +0200
Daniel Borkmann [off-list ref] wrote:

quoted

On 10/04/2017 02:03 PM, Jesper Dangaard Brouer wrote:
[...]

quoted

+#define CPU_MAP_BULK_SIZE 8  /* 8 == one cacheline on 64-bit archs */
+struct xdp_bulk_queue {
+	void *q[CPU_MAP_BULK_SIZE];
+	unsigned int count;
+};
+
+/* Struct for every remote "destination" CPU in map */
+struct bpf_cpu_map_entry {
+	u32 cpu;    /* kthread CPU and map index */
+	int map_id; /* Back reference to map */

map_id is not used here if I read it correctly? We should
then remove it.

It is actually used in a later patch. Notice, there is no unused
members in the final patch.  I did consider adding back in the later
patch, but it was annoying to during the devel and split-up patch
phase, as it creates conflicts when I move between the different
patches, that need to modify this struct.  Thus, I choose to keep the
end-struct in this cpumap-base-patch.  If you insist, I can go though
the patch-stack and carefully introduce changes to the struct in steps?

It would be great to have every patch as self-contained as possible
since it otherwise makes reviewing much more time consuming to check
through the other patches for possible usage patterns, I noticed you
are using it for the tracepoints in patch 4/5 to dump map_id. Would
definitely be good if you could avoid such split in future sets.

[...]

quoted

+void __cpu_map_queue_destructor(void *ptr)
+{
+	/* For now, just catch this as an error */
+	if (!ptr)
+		return;
+	pr_err("ERROR: %s() cpu_map queue was not empty\n", __func__);

Can you elaborate on this "for now" condition? Is this a race
when kthread doesn't consume queue on thread exit, or should it
be impossible to trigger (should it then be replaced with a
'if (WARN_ON_ONCE(ptr)) page_frag_free(ptr)' and a more elaborate
comment)?

The "for now" is an old comment while developing and testing this.
In this final state of the patchset it _should_ not be possible to
trigger this situation.  I like your idea of replacing it with a
WARN_ON_ONCE.  (as it might be good to keep in some form, as it would
catch is someone changing the code which breaks the RCU+WQ+kthread
tear-down procedure).

Ok.

[...]

quoted

+		/* Updating qsize cause re-allocation of bpf_cpu_map_entry */
+		rcpu = __cpu_map_entry_alloc(qsize, key_cpu, map->id);
+		if (!rcpu)
+			return -ENOMEM;
+	}
+	rcu_read_lock();
+	__cpu_map_entry_replace(cmap, key_cpu, rcpu);
+	rcu_read_unlock();
+	return 0;

You need to update verifier such that this function cannot be called
out of an BPF program,

In the example BPF program, I do a lookup into the map, but only to
verify that an entry exist (I don't look at the value).  I would like
to support such usage.

Ok, put comment below.

quoted

otherwise it would be possible under full RCU
read context, which is explicitly avoided here and also it would otherwise
be allowed for other maps of different type as well, which needs to
be avoided.

Sorry, I don't follow this.

What I meant is that check_map_func_compatibility() should check for
BPF_MAP_TYPE_CPUMAP and only allow func_id of BPF_FUNC_redirect_map
and BPF_FUNC_map_lookup_elem to be used, which I haven't seen the set
restricting it to. Some of your later patches do this for the helper
BPF_FUNC_redirect_map but the important point is that map updates
wouldn't be done out of the BPF program itself, but rather from user
space control path given they can't be done under full RCU read lock
context if I read this correctly (which the programs run in though).

[...]

quoted

+static void *cpu_map_lookup_elem(struct bpf_map *map, void *key)
+{
+	struct bpf_cpu_map_entry *rcpu =
+		__cpu_map_lookup_elem(map, *(u32 *)key);
+
+	return rcpu ? &rcpu->qsize : NULL;

The qsize doesn't seem used anywhere else besides here, but you
probably should update verifier such that this cannot be called
out of the BPF program, which could then mangle qsize value.

It is true that the BPF prog can modify this qsize value, but it's not
the authoritative value, so it doesn't really matter.

As I said above, I do want to do a lookup from a BPF program.  To allow
to BPF program to know, if an entry is valid, else it will blindly
send to a cpu destination.  Maybe bpf_prog's just have to use a
map-on-the-side to coordinate this(?), but then a sysadm modifying the
real cpumap will be invisible to the program.

Maybe we should just disable BPF-progs from reading this in the first
iteration?  It would allow for more advanced usage schemes later..

Okay, what you could do is the following to prevent unintended value
updates. Just read out the qsize from the value and put that into a
per-cpu scratch buffer that gets returned instead of the map value,
so even if the scratch buffer gets mangled, it's okay because the
actual map value doesn't. Verifier still checks the map value access
bounds for that one.

One crazy idea is to have the cpu_map_lookup_elem() return if any
packets are in-flight on this cpu-queue. (Making it easier to avoid OoO
packets, when switching target CPU, but it can also be implemented by
the BPF-programmer herself via maps, although via some extra atomic
cost).

quoted

+}
+
+static int cpu_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
+	u32 index = key ? *(u32 *)key : U32_MAX;
+	u32 *next = next_key;
+
+	if (index >= cmap->map.max_entries) {
+		*next = 0;
+		return 0;
+	}
+
+	if (index == cmap->map.max_entries - 1)
+		return -ENOENT;
+	*next = index + 1;
+	return 0;
+}

I would have liked to have implemented next_key so it only returned the
next valid cpu_entry, and used it as a simple round-robin scheduler.
But AFAIK the next_key function is not allowed from bpf_prog's, right?

Hm, true we don't export this as a helper right now, I currently don't
see a reason why we couldn't. For the array map, the get_next_key is
probably not too useful given we only return key + 1. For user space it
might help for iterating/dumping the map though. Probably one could
enable it and add a flag to search the next valid entry from the map,
though issue could well be that this might need to iterate/test millions
of empty slots until you get to the next valid one which might not be
suitable in a generic case to do out of a BPF prog, perhaps a second
map with keys to round robin over might help.

Cheers,
Daniel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help