Thread (11 messages) 11 messages, 3 authors, 2021-09-21

Re: [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation

From: Joanne Koong <hidden>
Date: 2021-09-20 21:05:37

On 9/17/21 10:01 AM, Alexei Starovoitov wrote:
On Mon, Sep 13, 2021 at 09:04:30PM -0700, Joanne Koong wrote:
quoted
+
+/* For bloom filter maps, the next 4 bits represent how many hashes to use.
+ * The maximum number of hash functions supported is 15. If this is not set,
+ * the default number of hash functions used will be 5.
+ */
+	BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
+	BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
+	BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
+	BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),
The bit selection is unintuitive.
Since key_size has to be zero may be used that instead to indicate the number of hash
functions in the rare case when 5 is not good enough?
Or use inner_map_fd since there is no possibility of having an inner map in bloomfilter.
It could be a union:
     __u32   max_entries;    /* max number of entries in a map */
     __u32   map_flags;      /* BPF_MAP_CREATE related
                              * flags defined above.
                              */
     union {
        __u32  inner_map_fd;   /* fd pointing to the inner map */
        __u32  nr_hash_funcs;  /* or number of hash functions */
     };
     __u32   numa_node;      /* numa node */
I really like the idea of union-ing inner_map_fd with the number of hash 
functions (my worry with
using key_size is that it might be a confusing / non-intuitive API quirk 
for users), but I think this
would later require us to add some bloom filter specific APIs to libbpf 
(such as bpf_map__set_nr_hashes).

To make the bit selection more intuitive, Andrii suggested defining some 
helper like

BPF_F_BLOOM_NR_HASH_OFF = 13

where the user could then do something like

struct {
     __uint(type, BPF_MAP_TYPE_BLOOM_FILTER),
     ...
     __uint(map_flags, 5 << BPF_F_BLOOM_NR_HASH_OFF),
};

to set the number of hash functions.

Would this approach address your concerns about the unintuitiveness of 
the bit selection?
quoted
+struct bpf_bloom_filter {
+	struct bpf_map map;
+	u32 bit_array_mask;
+	u32 hash_seed;
+	/* If the size of the values in the bloom filter is u32 aligned,
+	 * then it is more performant to use jhash2 as the underlying hash
+	 * function, else we use jhash. This tracks the number of u32s
+	 * in an u32-aligned value size. If the value size is not u32 aligned,
+	 * this will be 0.
+	 */
+	u32 aligned_u32_count;
what is the performance difference?
Using results from the hashmap benchmark tests, using jhash2 instead of 
jhash for 4-byte
aligned value sizes improved the performance by roughly 5% to 15%. For 
non-4-byte aligned
value sizes, there wasn't a noticeable difference between using jhash2 
(and truncating the
remainder bits) vs. using jhash.
May be we enforce 4-byte sized value for simplicity?
Sounds great! And if in the future this becomes too restrictive, we 
could always loosen this
as well
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help