Re: [PATCH bpf-next v4 16/20] bpf: Add a special case for bitwise AND on... | netdev

Re: [PATCH bpf-next v4 16/20] bpf: Add a special case for bitwise AND on range [-1, 0]

From: Shung-Hsi Yu <hidden>
Date: 2024-07-16 15:19:26
Also in: bpf

On Tue, Jul 16, 2024 at 03:05:11PM GMT, Xu Kuohai wrote:

On 7/15/2024 11:29 PM, Shung-Hsi Yu wrote:

quoted

Cc Harishankar Vishwanathan, Prof. Srinivas Narayana and Prof. Santosh
Nagarakatte, and Matan Shachnai, whom have recently work on
scalar*_min_max_and(); also dropping LSM/FS related mails from Cc since
it's a bit long and I'm not sure whether the mailing list will reject
due to too many email in Cc.

On Thu, Jul 11, 2024 at 07:38:24PM GMT, Xu Kuohai wrote:

quoted

With lsm return value check, the no-alu32 version test_libbpf_get_fd_by_id_opts
is rejected by the verifier, and the log says:

0: R1=ctx() R10=fp0
; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27
0: (b7) r0 = 0                        ; R0_w=0
1: (79) r2 = *(u64 *)(r1 +0)
func 'bpf_lsm_bpf_map' arg0 has btf_id 916 type STRUCT 'bpf_map'
2: R1=ctx() R2_w=trusted_ptr_bpf_map()
; if (map != (struct bpf_map *)&data_input) @ test_libbpf_get_fd_by_id_opts.c:29
2: (18) r3 = 0xffff9742c0951a00       ; R3_w=map_ptr(map=data_input,ks=4,vs=4)
4: (5d) if r2 != r3 goto pc+4         ; R2_w=trusted_ptr_bpf_map() R3_w=map_ptr(map=data_input,ks=4,vs=4)
; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27
5: (79) r0 = *(u64 *)(r1 +8)          ; R0_w=scalar() R1=ctx()
; if (fmode & FMODE_WRITE) @ test_libbpf_get_fd_by_id_opts.c:32
6: (67) r0 <<= 62                     ; R0_w=scalar(smax=0x4000000000000000,umax=0xc000000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xc000000000000000))
7: (c7) r0 s>>= 63                    ; R0_w=scalar(smin=smin32=-1,smax=smax32=0)
;  @ test_libbpf_get_fd_by_id_opts.c:0
8: (57) r0 &= -13                     ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3))
; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27
9: (95) exit

And here is the C code of the prog.

SEC("lsm/bpf_map")
int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode)
{
     if (map != (struct bpf_map *)&data_input)
	    return 0;

     if (fmode & FMODE_WRITE)
	    return -EACCES;

     return 0;
}

It is clear that the prog can only return either 0 or -EACCESS, and both
values are legal.

So why is it rejected by the verifier?

The verifier log shows that the second if and return value setting
statements in the prog is optimized to bitwise operations "r0 s>>= 63"
and "r0 &= -13". The verifier correctly deduces that the value of
r0 is in the range [-1, 0] after verifing instruction "r0 s>>= 63".
But when the verifier proceeds to verify instruction "r0 &= -13", it
fails to deduce the correct value range of r0.

7: (c7) r0 s>>= 63                    ; R0_w=scalar(smin=smin32=-1,smax=smax32=0)
8: (57) r0 &= -13                     ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3))

So why the verifier fails to deduce the result of 'r0 &= -13'?

The verifier uses tnum to track values, and the two ranges "[-1, 0]" and
"[0, -1ULL]" are encoded to the same tnum. When verifing instruction
"r0 &= -13", the verifier erroneously deduces the result from
"[0, -1ULL] AND -13", which is out of the expected return range
[-4095, 0].

As explained by Eduard in [0], the clang transformation that generates this
pattern is located in DAGCombiner::SimplifySelectCC() method (see [1]).

...

quoted

As suggested by Eduard and Andrii, this patch makes a special case
for source or destination register of '&=' operation being in
range [-1, 0].

...

Been wonder whether it possible for a more general approach ever since I
saw the discussion back in April. I think I've finally got something.

The problem we face here is that the tightest bound for the [-1, 0] case
was tracked with signed ranges, yet the BPF verifier looses knowledge of
them all too quickly in scalar*_min_max_and(); knowledge of previous
signed ranges were not used at all to derive the outcome of signed
ranges after BPF_AND.

	static void scalar_min_max_and(...) {
		...
		if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) {
			dst_reg->smin_value = dst_reg->umin_value;
			dst_reg->smax_value = dst_reg->umax_value;
		} else {
			dst_reg->smin_value = S64_MIN;
			dst_reg->smax_value = S64_MAX;
		}
		...
	}

This is indeed the root cause.

quoted

So looks like its time to be nobody[1] and try to teach BPF verifier how
track signed ranges when ANDing two (possibly) negative numbers. Luckily
bitwise AND is comparatively easier to do than other bitwise operations:
non-negative range & non-negative range is always non-negative,
non-negative range & negative range is still always non-negative, and
negative range & negative range is always negative.

Right, only bitwise ANDing two negatives yields to a negative result.

quoted

smax_value is straight forwards, we can just do

	max(dst_reg->smax_value, src_reg->smax_value)

which works across all sign combinations. Technically for non-negative &
non-negative we can use min() instead of max(), but the non-negative &
non-negative case should be handled pretty well by the unsigned ranges
already; it seems simpler to let such knowledge flows from unsigned
ranges to signed ranges during reg_bounds_sync(). Plus we are not wrong
for non-negative & non-negative by using max(), just imprecise, so no
correctness/soundness issue here.

I think this is correct, since in two's complement, more '1' bits means
more large, regardless of sign, and bitwise AND never generates more '1'
bits.

quoted

smin_value is the tricker one, but doable with

	masked_negative(min(dst_reg->smin_value, src_reg->smin_value))

where masked_negative(v) basically just clear all bits after the most
significant unset bit, effectively rounding a negative value down to a
negative power-of-2 value, and returning 0 for non-negative values. E.g.
for some 8-bit, negative value

	masked_negative(0b11101001) == 0b11100000

Ah, it's really tricky. Seems it's the longest high '1' bits sequence
in both operands. This '1' bits should remain unchanged by the bitwise
AND operation. So this sequence must be in the result, making it the
minimum possible value.

quoted

This can be done with a tweaked version of "Round up to the next highest
power of 2"[2],

	/* Invert the bits so the first unset bit can be propagated with |= */
	v = ~v;
	/* Now propagate the first (previously unset, now set) bit to the
	 * trailing positions */
	v |= v >> 1;
	v |= v >> 2;
	v |= v >> 4;
	...
	v |= v >> 32; /* Assuming 64-bit */
	/* Propagation done, now invert again */
	v = ~v;

Again, we technically can do better if we take sign bit into account,
but deriving smin_value this way should still be correct/sound across
different sign combinations, and overall should help us derived [-16, 0]
from "[-1, 0] AND -13", thus preventing BPF verifier from rejecting the
program.

---

Alternatively we can employ a range-splitting trick (think I saw this in
[3]) that allow us to take advantage of existing tnum_and() by splitting
the signed ranges into two if the range crosses the sign boundary (i.e.
contains both non-negative and negative values), one range will be
[smin, U64_MAX], the other will be [0, smax]. This way we get around
tnum's weakness of representing [-1, 0] as [0, U64_MAX].

	if (src_reg->smin_value < 0 && src_reg->smax_value >= 0) {
		src_lower = tnum_range(src_reg->smin_value, U64_MAX);
		src_higher = tnum_range(0, src_reg->smax_value);
	} else {
		src_lower = tnum_range(src_reg->smin_value, src_reg->smax_value);
		src_higher = tnum_range(src_reg->smin_value, src_reg->smax_value);
	}

	if (dst_reg->smin_value < 0 && dst_reg->smax_value >= 0) {
		dst_lower = tnum_range(dst_reg->smin_value, U64_MAX);
		dst_higher = tnum_range(0, dst_reg->smax_value);
	} else {
		dst_lower = tnum_range(dst_reg->smin_value, dst_reg->smax_value);
		dst_higher = tnum_range(dst_reg->smin_value, dst_reg->smax_value);
	}

	lower = tnum_and(src_lower, dst_lower);
	higher = tnum_and(src_higher, dst_higher);
	dst->smin_value = lower.value;
	dst->smax_value = higher.value | higher.mask;

This looks even more tricky...

Indeed, and I think the above is still wrong because it did not proper
set smin_value to S64_MIN when needed.

quoted

Personally I like the first method better as it is simpler yet still
does the job well enough. I'll work on that in the next few days and see
if it actually works.

This really sounds great. Thank you for the excellent work!

Sent RFC in sibling thread. I think it would be better if the patch was
included as part of your series. But let's see what the other think of
it first.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help