Re: [RFC bpf-next] bpf, verifier: improve signed ranges inference for BPF_AND

From: Shung-Hsi Yu <hidden>
Date: 2024-07-19 08:32:53
Also in: bpf

On Wed, Jul 17, 2024 at 02:10:35PM GMT, Eduard Zingerman wrote:

On Tue, 2024-07-16 at 22:52 +0800, Shung-Hsi Yu wrote:

[...]

quoted

To allow verification of such instruction pattern, update
scalar*_min_max_and() to infer signed ranges directly from signed ranges
of the operands. With BPF_AND, the resulting value always gains more
unset '0' bit, thus it only move towards 0x0000000000000000. The
difficulty lies with how to deal with signs. While non-negative
(positive and zero) value simply grows smaller, a negative number can
grows smaller, but may also underflow and become a larger value.

To better address this situation we split the signed ranges into
negative range and non-negative range cases, ignoring the mixed sign
cases for now; and only consider how to calculate smax_value.

Since negative range & negative range preserve the sign bit, so we know
the result is still a negative value, thus it only move towards S64_MIN,
but never underflow, thus a save bet is to use a value in ranges that is
closet to 0, thus "max(dst_reg->smax_value, src->smax_value)". For
negative range & positive range the sign bit is always cleared, thus we
know the resulting is a non-negative, and only moves towards 0, so a
safe bet is to use smax_value of the non-negative range. Last but not
least, non-negative range & non-negative range is still a non-negative
value, and only moves towards 0; however same as the unsigned range
case, the maximum is actually capped by the lesser of the two, and thus
min(dst_reg->smax_value, src_reg->smax_value);

Listing out the above reasoning as a table (dst_reg abbreviated as dst,
src_reg abbreviated as src, smax_value abbrivated as smax) we get:

                        |                         src_reg
       smax = ?         +---------------------------+---------------------------
                        |        negative           |       non-negative
---------+--------------+---------------------------+---------------------------
         | negative     | max(dst->smax, src->smax) |         src->smax
dst_reg  +--------------+---------------------------+---------------------------
         | non-negative |         dst->smax         | min(dst->smax, src->smax)

However this is quite complicated, luckily it can be simplified given
the following observations

    max(dst_reg->smax_value, src_reg->smax_value) >= src_reg->smax_value
    max(dst_reg->smax_value, src_reg->smax_value) >= dst_reg->smax_value
    max(dst_reg->smax_value, src_reg->smax_value) >= min(dst_reg->smax_value, src_reg->smax_value)

So we could substitute the cells in the table above all with max(...),
and arrive at:

                        |                         src_reg
      smax' = ?         +---------------------------+---------------------------
                        |        negative           |       non-negative
---------+--------------+---------------------------+---------------------------
         | negative     | max(dst->smax, src->smax) | max(dst->smax, src->smax)
dst_reg  +--------------+---------------------------+---------------------------
         | non-negative | max(dst->smax, src->smax) | max(dst->smax, src->smax)

Meaning that simply using

  max(dst_reg->smax_value, src_reg->smax_value)

to calculate the resulting smax_value would work across all sign combinations.


For smin_value, we know that both non-negative range & non-negative
range and negative range & non-negative range both result in a
non-negative value, so an easy guess is to use the minimum non-negative
value, thus 0.

                        |                         src_reg
       smin = ?         +----------------------------+---------------------------
                        |          negative          |       non-negative
---------+--------------+----------------------------+---------------------------
         | negative     |             ?              |             0
dst_reg  +--------------+----------------------------+---------------------------
         | non-negative |             0              |             0

This leave the negative range & negative range case to be considered. We
know that negative range & negative range always yield a negative value,
so a preliminary guess would be S64_MIN. However, that guess is too
imprecise to help with the r0 <<= 62, r0 s>>= 63, r0 &= -13 pattern
we're trying to deal with here.

This can be further improve with the observation that for negative range
& negative range, the smallest possible value must be one that has
longest _common_ most-significant set '1' bits sequence, thus we can use
min(dst_reg->smin_value, src->smin_value) as the starting point, as the
smaller value will be the one with the shorter most-significant set '1'
bits sequence. But that alone is not enough, as we do not know whether
rest of the bits would be set, so the safest guess would be one that
clear alls bits after the most-significant set '1' bits sequence,
something akin to bit_floor(), but for rounding to a negative power-of-2
instead.

    negative_bit_floor(0xffff000000000003) == 0xffff000000000000
    negative_bit_floor(0xf0ff0000ffff0000) == 0xf000000000000000
    negative_bit_floor(0xfffffb0000000000) == 0xfffff80000000000

With negative range & negative range solve, we now have:

                        |                         src_reg
       smin = ?         +----------------------------+---------------------------
                        |        negative            |       non-negative
---------+--------------+----------------------------+---------------------------
         |   negative   |negative_bit_floor(         |             0
         |              |  min(dst->smin, src->smin))|
dst_reg  +--------------+----------------------------+---------------------------
         | non-negative |           0                |             0

This can be further simplied since min(dst->smin, src->smin) < 0 when both
dst_reg and src_reg have a negative range. Which means using

    negative_bit_floor(min(dst_reg->smin_value, src_reg->smin_value)

to calculate the resulting smin_value would work across all sign combinations.

Together these allows us to infer the signed range of the result of BPF_AND
operation using the signed range from its operands.

Hi Shung-Hsi,

This seems quite elegant.
As an additional check, I did a simple brute-force for all possible
ranges of 6-bit integers and bounds are computed safely.

Thanks for looking into this, as well as the complement.

Did took me quite awhile to try come up with a simple solution that
works just well enough without further complication, felt quite proud :)

[...]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help