Re: [PATCH v5 bpf-next 04/11] bpf: Add map side support for bpf timers.

From: Martin KaFai Lau <hidden>
Date: 2021-07-09 06:05:05
Also in: bpf

On Thu, Jul 08, 2021 at 08:52:23PM -0700, Alexei Starovoitov wrote:

On Thu, Jul 08, 2021 at 06:51:19PM -0700, Martin KaFai Lau wrote:

quoted

+
 /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
 static void array_map_free(struct bpf_map *map)
 {

@@ -382,6 +402,7 @@ static void array_map_free(struct bpf_map *map)
 	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
 		bpf_array_free_percpu(array);
 
+	array_map_free_timers(map);

array_map_free() is called when map->refcnt reached 0.
By then, map->usercnt should have reached 0 before
and array_map_free_timers() should have already been called,
so no need to call it here again?  The same goes for hashtab.

Not sure it's that simple.
Currently map->usercnt > 0 check is done for bpf_timer_set_callback only,
because prog refcnting is what matters to usercnt and map_release_uref scheme.
bpf_map_init doesn't have this check because there is no circular dependency
prog->map->timer->prog to worry about.
So after usercnt reached zero the prog can still do bpf_timer_init.

Ah. right. missed the bpf_timer_init().

I guess we can add usercnt > 0 to bpf_timer_init as well.
Need to think whether it's enough and the race between atomic64_read(usercnt)
and atomic64_dec_and_test(usercnt) is addressed the same way as the race
in set_callback and cancel_and_free. So far looks like it. Hmm.

hmm... right, checking usercnt > 0 seems ok.
When usercnt is 0, it may be better to also error out instead of allocating
a timer that cannot be used.

I was mostly thinking avoiding changes in map_free could make future map
support a little easier.

quoted

+static void htab_free_malloced_timers(struct bpf_htab *htab)
+{
+	int i;
+
+	rcu_read_lock();
+	for (i = 0; i < htab->n_buckets; i++) {
+		struct hlist_nulls_head *head = select_bucket(htab, i);
+		struct hlist_nulls_node *n;
+		struct htab_elem *l;
+
+		hlist_nulls_for_each_entry(l, n, head, hash_node)

May be put rcu_read_lock/unlock() in the loop and do a
cond_resched() in case the hashtab is large.

Just recalled cond_resched_rcu() may be cleaner, like:

static void htab_free_malloced_timers(struct bpf_htab *htab)
{
        int i;

        rcu_read_lock();
	for (i = 0; i < htab->n_buckets; i++) {
		/* ... */
		hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
			check_and_free_timer(htab, l);
		cond_resched_rcu();
	}
	rcu_read_unlock();
}

Feels a bit like premature optimization. delete_all_elements()
loop right above is doing similar work without cond_resched.
I don't mind cond_resched. I just don't see how to cleanly add it
without breaking rcu_read_lock and overcomplicating the code.

yep, it can be done later together with delete_all_elements().

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help