Re: [PATCH bpf-next v4 1/2] bpf: allow rewriting to ports under ip_unprivileged_port_start
From: Andrey Ignatov <hidden>
Date: 2021-01-27 18:26:06
Also in:
bpf
Stanislav Fomichev [off-list ref] [Tue, 2021-01-26 11:36 -0800]:
At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port
to the privileged ones (< ip_unprivileged_port_start), but it will
be rejected later on in the __inet_bind or __inet6_bind.
Let's add another return value to indicate that CAP_NET_BIND_SERVICE
check should be ignored. Use the same idea as we currently use
in cgroup/egress where bit #1 indicates CN. Instead, for
cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should
be bypassed.
v4:
- Add missing IPv6 support (Martin KaFai Lau)
v3:
- Update description (Martin KaFai Lau)
- Fix capability restore in selftest (Martin KaFai Lau)
v2:
- Switch to explicit return code (Martin KaFai Lau)
Cc: Andrey Ignatov <redacted>
Cc: Martin KaFai Lau <redacted>
Signed-off-by: Stanislav Fomichev <redacted>Explicit return code looks much cleaner than both what v1 did and what I proposed earlier (compare port before/after). Just one nit from me but otherwide looks good. Acked-by: Andrey Ignatov <redacted> ...
quoted hunk ↗ jump to hunk
@@ -231,30 +232,48 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, #define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, type) \ ({ \ + u32 __unused_flags; \ int __ret = 0; \ if (cgroup_bpf_enabled(type)) \ __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ - NULL); \ + NULL, \ + &__unused_flags); \ __ret; \ }) #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx) \ ({ \ + u32 __unused_flags; \ int __ret = 0; \ if (cgroup_bpf_enabled(type)) { \ lock_sock(sk); \ __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ - t_ctx); \ + t_ctx, \ + &__unused_flags); \ release_sock(sk); \ } \ __ret; \ }) -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_BIND, NULL) - -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_BIND, NULL) +/* BPF_CGROUP_INET4_BIND and BPF_CGROUP_INET6_BIND can return extra flags + * via upper bits of return code. The only flag that is supported + * (at bit position 0) is to indicate CAP_NET_BIND_SERVICE capability check + * should be bypassed. + */ +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) \ +({ \ + u32 __flags = 0; \ + int __ret = 0; \ + if (cgroup_bpf_enabled(type)) { \ + lock_sock(sk); \ + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ + NULL, &__flags); \ + release_sock(sk); \ + if (__flags & 1) \ + *flags |= BIND_NO_CAP_NET_BIND_SERVICE; \
Nit: It took me some time to realize that there are two different
"flags": one to pass to __cgroup_bpf_run_filter_sock_addr() and another
to pass to __inet{,6}_bind/BPF_CGROUP_RUN_PROG_INET_BIND_LOCK that both carry
"BIND_NO_CAP_NET_BIND_SERVICE" flag but do it differently:
* hard-coded 0x1 in the former case;
* and BIND_NO_CAP_NET_BIND_SERVICE == (1 << 3) in the latter.
I'm not sure how to make it more readable: maybe name `flags` and
`__flags` differently to highlight the difference (`bind_flags` and
`__flags`?) and add a #define for the "1" here?
In anycase IMO it's not worth a respin and can be addressed by a
follow-up if you agree.
+ } \ + __ret; \ +}) #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) \ ((cgroup_bpf_enabled(BPF_CGROUP_INET4_CONNECT) || \
-- Andrey Ignatov