Re: [PATCH bpf-next v4 06/10] bpf: Track provenance for pointers formed from... | netdev

Re: [PATCH bpf-next v4 06/10] bpf: Track provenance for pointers formed from referenced PTR_TO_BTF_ID

From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date: 2021-12-19 19:56:08
Also in: bpf, netfilter-devel

On Mon, Dec 20, 2021 at 12:38:10AM IST, Alexei Starovoitov wrote:

On Sun, Dec 19, 2021 at 11:40:44PM +0530, Kumar Kartikeya Dwivedi wrote:

quoted

On Sun, Dec 19, 2021 at 11:13:18PM IST, Alexei Starovoitov wrote:

quoted

On Sat, Dec 18, 2021 at 9:25 PM Kumar Kartikeya Dwivedi
[off-list ref] wrote:

quoted

On Sun, Dec 19, 2021 at 10:35:18AM IST, Alexei Starovoitov wrote:

quoted

On Sat, Dec 18, 2021 at 8:33 PM Kumar Kartikeya Dwivedi
[off-list ref] wrote:

quoted

It is, but into parent_ref_obj_id, to match during release_reference.

quoted

Shouldn't r2 get a different ref_obj_id after r2 = r1->next ?

It's ref_obj_id is still 0.

Thinking about this more, we actually only need 1 extra bit of information in
reg_state, not even a new member. We can simply copy ref_obj_id and set this
bit, then we can reject this register during release but consider it during
release_reference.

It seems to me that this patch created the problem and it's trying
to fix it at the same time.

Yes, sort of. Maybe I need to improve the commit message? I give an example
below, and the first half of commit explains that if we simply did copy
ref_obj_id, it would lead to the case in the previous mail (same BTF ID ptr can
be passed), so we need to do something different.

Maybe that is what is confusing you.

I'm still confused.
Why does mark_btf_ld_reg() need to copy ref_obj_id ?
It should keep it as zero.

So that we can find deref pointers obtained from the reg having that ref_obj_id
when it is released, and invalidate them. But since directly storing in
ref_obj_id of deref dst_reg will be bad (if we get same BTF ID from deref we
could now pass it to release kfunc), we add a new member and match on that.

quoted

mark_btf_ld_reg() is used in deref only.
The ref_obj_id is assigned by check_helper_call().
r2 = r0; will copy it, but
r2 = r0->next; will keep r2->ref_obj_id as zero.

quoted

mark_btf_ld_reg() shouldn't be copying ref_obj_id.
If it keeps it as zero the problem will not happen, no?

It is copying it but writing it to parent_ref_obj_id. It keeps ref_obj_id as 0
for all deref pointers.

r1 = acq(); // r1.ref = acquire_reference_state();
 ref = N
r2 = r1->a; // mark_btf_ld_reg -> copy r1.(ref ?: parent_ref) -> so r2.parent_ref = r1.ref
r3 = r2->b; // mark_btf_ld_reg -> copy r2.(ref ?: parent_ref) -> so r3.parent_ref = r2.parent_ref
r4 = r3->c; // mark_btf_ld_reg -> copy r3.(ref ?: parent_ref) -> so r4.parent_ref = r3.parent_ref
rel(r1);    // if (reg.ref == r1.ref || reg.parent_ref == r1.ref) invalidate(reg)

As you see, mark_btf_ld_reg only ever writes to parent_ref_obj_id, not
ref_obj_id. It just copies ref_obj_id when it is set, over parent_ref_obj_id,
and only one of two can be set.

I don't understand why such logic is needed.

Ok, let me try to explain once how I arrived at it. If you still don't like it,
I'll drop it from the series.

So until this patch, when we do the following:

	struct nf_conn *ct = bpf_xdp_ct_lookup(...);
	if (ct) {
		struct nf_conn *master = ct->master;
		bpf_ct_release(ct);
		unsigned long status = master->status; // I want to prevent this
	}

... this will work, which is ok (as in won't crash the kernel) since the load
will be converted to BPF_PROBE_MEM, but I want to disallow this case since it is
obviously incorrect.

Finally we're talking! This motivation should have been in the commit log
and this thread wouldn't have been that long.

Indeed, sorry for wasting your time. I'll try to do better, and thanks for being
patient.

quoted

The obvious solution (to me) was to kill all registers and stack slots for deref
pointers.

My first naive solution was to simply copy ref_obj_id on mark_btf_ld_reg, so
that it can be matched and released from release_reference.

That what I was guessing.

quoted

But then I noticed that if the BTF ID is same, there is no difference when it is
passed to release kfunc compared to the original register it was loaded from.

	struct nf_conn *ct = bpf_xdp_ct_lookup(...);
	if (ct) {
		struct nf_conn *master = ct->master; // copied ref_obj_id
		bpf_ct_release(master); // works, but shouldn't!
	}

So the code needed some way to distinguish this deref pointer that must be
invalidated only when its 'parent' goes away. Hence the introduction of
parent_ref_obj_id, and the invariant that only one of ref_obj_id or
parent_ref_obj_id must be set.

The goal is clear now, but look at it differently:
struct nf_conn *ct = bpf_xdp_ct_lookup(...);
if (ct) {
  struct nf_conn *master = ct->master;
  struct net *net = ct->ct_net.net;

  bpf_ct_release(ct);
  master->status; // prevent this ?
  net->ifindex;   // but allow this ?

I think both will be prevented with the current logic, no?
net will be ct + offset, so if mark_btf_ld_reg writes PTR_TO_BTF_ID to dst_reg
for net, it will copy ct's reg's ref_obj_id to parent_ref_obj_id of dst_reg (net).
Then on release of ct, net's reg gets killed too since reg[ct]->ref_obj_id
matches its parent_ref_obj_id.

I just tried it out:

7: (85) call bpf_skb_ct_lookup#121995
last_idx 7 first_idx 0
regs=8 stack=0 before 6: (b4) w5 = 4
regs=8 stack=0 before 5: (bf) r4 = r2
regs=8 stack=0 before 4: (b4) w3 = 4
last_idx 7 first_idx 0
regs=20 stack=0 before 6: (b4) w5 = 4
; if (ct) {
8: (15) if r0 == 0x0 goto pc+4
 R0_w=ptr_nf_conn(id=0,ref_obj_id=2,off=0,imm=0) R6_w=invP0 R10=fp0 fp-8=mmmm???? refs=2
; struct net *net = ct->ct_net.net;
9: (79) r6 = *(u64 *)(r0 +208)
; bpf_ct_release(ct);
10: (bf) r1 = r0
11: (85) call bpf_ct_release#121993
; return net->ifindex; // but allow this ?
12: (61) r6 = *(u32 *)(r6 +80)
R6 invalid mem access 'inv'
processed 13 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
-- END PROG LOAD LOG --

Same result when you follow pointer chains.

}
The verifier cannot statically check this. That's why all such deref
are done via BPF_PROBE_MEM (which is the same as probe_read_kernel).
We must disallow use after free when it can cause a crash.
This case is not the one.

That is a valid point, this is certainly in 'nice to have/prevents obvious
misuse' territory, but if this can be done without introducing too much
complexity, I'd like us to do it.

A bit of a digression, but:
I'm afraid this patch is going to be brought up again for a future effort
related to XDP queueing that Toke is working on. We have a similar scenario
there, when xdp_md (aliasing xdp_frame) is dequeued from the PIFO map, and
PTR_TO_PACKET is obtained by reading xdp_md->data. The xdp_md is referenced, so
we need to invalidate these pkt pointers as well, in addition to killing xdp_md
copies. Also this parent_ref_obj_id state allows us to reject comparisons
between pkt pointers pointing into different xdp_md's (when you dequeue more
than one at once and form multiple pkt pointers pointing into different
xdp_mds).
Then it would be required for correctness purposes, but we'll discuss that once
it is posted (but wanted to let you know as an FYI). I can probably also
consider your map_uid suggestion for that, but IMO parent_ref_obj_id is more
generic, and requires less special cases in verifier.

This one, though:
  struct nf_conn *ct = bpf_xdp_ct_lookup(...);
  struct nf_conn *master = ct->master;
  bpf_ct_release(master);
definitely has to be prevented, since it will cause a crash.

As a follow up to this set would be great to allow ptr_to_btf_id
pointers persist longer than program execution.
Users already asked to allow the following:
  map_value = bpf_map_lookup_elem(...);
  struct nf_conn *ct = bpf_xdp_ct_lookup(...);
  map_value->saved_ct = ct;
and some time later in a different or the same program:
  map_value = bpf_map_lookup_elem(...);
  bpf_ct_release(map_value->saved_ct);

Currently folks work around this deficiency by storing some
sort of id and doing extra lookups while performance is suffering.
wdyt?

Very interesting idea! I'm guessing we'll need something akin to bpf_timer
support, i.e. a dedicated type verified using BTF which can be embedded in
map_value? I'll be happy to work on enabling this.

One thought though (just confirming):
If user does map_value->saved_ct = ct, we have to ignore reference leak check
for ct's ref_id, but if they rewrite saved_ct, we would also have to unignore
it, correct?

I think we can make this tracking easier by limiting to one bpf_ptr_to_btf
struct in map_value, then it can simply be part of ptr_to_map_value's reg_state.

--
Kartikeya

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help