Re: [PATCH bpf-next v7 05/11] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP
From: Martin KaFai Lau <hidden>
Date: 2022-05-26 00:04:06
Also in:
bpf
On Wed, May 25, 2022 at 02:25:54PM -0700, sdf@google.com wrote:
On 05/25, Martin KaFai Lau wrote:quoted
On Wed, May 25, 2022 at 10:02:07AM -0700, Stanislav Fomichev wrote:quoted
On Wed, May 25, 2022 at 9:01 AM Stanislav Fomichev [off-list ref]wrote:quoted
quoted
On Tue, May 24, 2022 at 9:39 PM Andrii Nakryiko [off-list ref] wrote:quoted
On Tue, May 24, 2022 at 9:03 PM Stanislav Fomichev[off-list ref] wrote:quoted
quoted
quoted
quoted
On Tue, May 24, 2022 at 4:45 PM Andrii Nakryiko [off-list ref] wrote:quoted
On Tue, May 24, 2022 at 10:50 AM Martin KaFai Lau[off-list ref] wrote:quoted
quoted
quoted
quoted
quoted
quoted
On Tue, May 24, 2022 at 08:55:04AM -0700, Stanislav Fomichevwrote:quoted
quoted
quoted
quoted
quoted
quoted
quoted
On Mon, May 23, 2022 at 8:49 PM Martin KaFai Lau[off-list ref] wrote:quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
On Wed, May 18, 2022 at 03:55:25PM -0700, StanislavFomichev wrote:quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
We have two options: 1. Treat all BPF_LSM_CGROUP the same, regardless ofattach_btf_idquoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
2. Treat BPF_LSM_CGROUP+attach_btf_id as a separatehook pointquoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
I was doing (2) in the original patch, but switchingto (1) here:quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
* bpf_prog_query returns all attached BPF_LSM_CGROUPprogramsquoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
regardless of attach_btf_id * attach_btf_id is exported via bpf_prog_info Signed-off-by: Stanislav Fomichev <redacted> --- include/uapi/linux/bpf.h | 5 ++ kernel/bpf/cgroup.c | 103+++++++++++++++++++++++++++------------quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
kernel/bpf/syscall.c | 4 +- 3 files changed, 81 insertions(+), 31 deletions(-)diff --git a/include/uapi/linux/bpf.hb/include/uapi/linux/bpf.hquoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
index b9d2d6de63a7..432fc5f49567 100644--- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h@@ -1432,6 +1432,7 @@ union bpf_attr { __u32 attach_flags; __aligned_u64 prog_ids; __u32 prog_cnt; + __aligned_u64 prog_attach_flags; /*output: per-program attach_flags */quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
} query; struct { /* anonymous struct used byBPF_RAW_TRACEPOINT_OPEN command */quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
@@ -5911,6 +5912,10 @@ struct bpf_prog_info { __u64 run_cnt; __u64 recursion_misses; __u32 verified_insns; + /* BTF ID of the function to attach to withinBTF object identifiedquoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ * by btf_id. + */ + __u32 attach_btf_func_id; } __attribute__((aligned(8))); struct bpf_map_info {diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index a959cdd22870..08a1015ee09e 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c@@ -1074,6 +1074,7 @@ static intcgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
static int __cgroup_bpf_query(struct cgroup *cgrp,const union bpf_attr *attr,quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
union bpf_attr __user*uattr)quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
{ + __u32 __user *prog_attach_flags =u64_to_user_ptr(attr->query.prog_attach_flags);quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
__u32 __user *prog_ids =u64_to_user_ptr(attr->query.prog_ids);quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
enum bpf_attach_type type =attr->query.attach_type;quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
enum cgroup_bpf_attach_type atype;@@ -1081,50 +1082,92 @@ static int__cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
struct hlist_head *progs; struct bpf_prog *prog; int cnt, ret = 0, i; + int total_cnt = 0; u32 flags; - atype = to_cgroup_bpf_attach_type(type); - if (atype < 0) - return -EINVAL; + enum cgroup_bpf_attach_type from_atype, to_atype; - progs = &cgrp->bpf.progs[atype]; - flags = cgrp->bpf.flags[atype]; + if (type == BPF_LSM_CGROUP) { + from_atype = CGROUP_LSM_START; + to_atype = CGROUP_LSM_END; + } else { + from_atype =to_cgroup_bpf_attach_type(type);quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ if (from_atype < 0) + return -EINVAL; + to_atype = from_atype; + } - effective =rcu_dereference_protected(cgrp->bpf.effective[atype],quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-lockdep_is_held(&cgroup_mutex));quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ for (atype = from_atype; atype <= to_atype;atype++) {quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ progs = &cgrp->bpf.progs[atype]; + flags = cgrp->bpf.flags[atype]; - if (attr->query.query_flags &BPF_F_QUERY_EFFECTIVE)quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
- cnt = bpf_prog_array_length(effective); - else - cnt = prog_list_length(progs); + effective =rcu_dereference_protected(cgrp->bpf.effective[atype],quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+lockdep_is_held(&cgroup_mutex));quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
- if (copy_to_user(&uattr->query.attach_flags,&flags, sizeof(flags)))quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
- return -EFAULT; - if (copy_to_user(&uattr->query.prog_cnt, &cnt,sizeof(cnt)))quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ if (attr->query.query_flags &BPF_F_QUERY_EFFECTIVE)quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ total_cnt +=bpf_prog_array_length(effective);quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ else + total_cnt +=prog_list_length(progs);quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ } + + if (type != BPF_LSM_CGROUP) + if(copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ return -EFAULT; + if (copy_to_user(&uattr->query.prog_cnt,&total_cnt, sizeof(total_cnt)))quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
return -EFAULT; - if (attr->query.prog_cnt == 0 || !prog_ids ||!cnt)quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ if (attr->query.prog_cnt == 0 || !prog_ids ||!total_cnt)quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
/* return early if user requested onlyprogram count + flags */quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
return 0; - if (attr->query.prog_cnt < cnt) { - cnt = attr->query.prog_cnt; + + if (attr->query.prog_cnt < total_cnt) { + total_cnt = attr->query.prog_cnt; ret = -ENOSPC; } - if (attr->query.query_flags &BPF_F_QUERY_EFFECTIVE) {quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
- returnbpf_prog_array_copy_to_user(effective, prog_ids, cnt);quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
- } else { - struct bpf_prog_list *pl; - u32 id; + for (atype = from_atype; atype <= to_atype;atype++) {quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ if (total_cnt <= 0) + break; - i = 0; - hlist_for_each_entry(pl, progs, node) { - prog = prog_list_prog(pl); - id = prog->aux->id; - if (copy_to_user(prog_ids + i,&id, sizeof(id)))quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
- return -EFAULT; - if (++i == cnt) - break; + progs = &cgrp->bpf.progs[atype]; + flags = cgrp->bpf.flags[atype]; + + effective =rcu_dereference_protected(cgrp->bpf.effective[atype],quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+lockdep_is_held(&cgroup_mutex));quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ + if (attr->query.query_flags &BPF_F_QUERY_EFFECTIVE)quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ cnt =bpf_prog_array_length(effective);quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ else + cnt = prog_list_length(progs); + + if (cnt >= total_cnt) + cnt = total_cnt; + + if (attr->query.query_flags &BPF_F_QUERY_EFFECTIVE) {quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ ret =bpf_prog_array_copy_to_user(effective, prog_ids, cnt);quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ } else { + struct bpf_prog_list *pl; + u32 id; + + i = 0; + hlist_for_each_entry(pl, progs,node) {quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ prog = prog_list_prog(pl); + id = prog->aux->id; + if(copy_to_user(prog_ids + i, &id, sizeof(id)))quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ return -EFAULT; + if (++i == cnt) + break; + } } + + if (prog_attach_flags) + for (i = 0; i < cnt; i++) + if(copy_to_user(prog_attach_flags + i, &flags, sizeof(flags)))quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+ return -EFAULT; + + prog_ids += cnt; + total_cnt -= cnt; + if (prog_attach_flags) + prog_attach_flags += cnt; } return ret; }diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 5ed2093e51cc..4137583c04a2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c@@ -3520,7 +3520,7 @@ static int bpf_prog_detach(constunion bpf_attr *attr)quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
} } -#define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt +#define BPF_PROG_QUERY_LAST_FIELDquery.prog_attach_flagsquoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
static int bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)@@ -3556,6 +3556,7 @@ static int bpf_prog_query(constunion bpf_attr *attr,quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
case BPF_CGROUP_SYSCTL: case BPF_CGROUP_GETSOCKOPT: case BPF_CGROUP_SETSOCKOPT: + case BPF_LSM_CGROUP: return cgroup_bpf_prog_query(attr, uattr); case BPF_LIRC_MODE2: return lirc_prog_query(attr, uattr);@@ -4066,6 +4067,7 @@ static intbpf_prog_get_info_by_fd(struct file *file,quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
if (prog->aux->btf) info.btf_id = btf_obj_id(prog->aux->btf); + info.attach_btf_func_id =prog->aux->attach_btf_id;quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
Note that exposing prog->aux->attach_btf_id only may notbe enoughquoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
unless it can assume info.attach_btf_id is alwaysreferring to btf_vmlinuxquoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
for all bpf prog types.We also export btf_id two lines above, right? Btw, I lefta comment inquoted
quoted
quoted
quoted
quoted
quoted
quoted
the bpftool about those btf_ids, I'm not sure how resolvethem andquoted
quoted
quoted
quoted
quoted
quoted
quoted
always assume vmlinux for now.yeah, that btf_id above is the cgroup-lsm prog's btf_idwhich has itsquoted
quoted
quoted
quoted
quoted
quoted
func info, line info...etc. It is not the one theattach_btf_id correspondquoted
quoted
quoted
quoted
quoted
quoted
to. attach_btf_id refers to either aux->attach_btf oraux->dst_prog's btf (orquoted
quoted
quoted
quoted
quoted
quoted
target btf id here). It needs a consensus on where this attach_btf_id, target btfid, andquoted
quoted
quoted
quoted
quoted
quoted
prog_attach_flags should be. If I read the patch 7 threadcorrectly,quoted
quoted
quoted
quoted
quoted
quoted
I think Andrii is suggesting to expose them to userspacethrough link, soquoted
quoted
quoted
quoted
quoted
quoted
potentially putting them in bpf_link_info. Thebpf_prog_query willquoted
quoted
quoted
quoted
quoted
quoted
output a list of link ids. The same probably applies toYep and I think it makes sense because link is representing one specific attachment (and I presume flags can be stored insidethe linkquoted
quoted
quoted
quoted
quoted
itself as well, right?). But if legacy non-link BPF_PROG_ATTACH is supported then using bpf_link_info won't cover legacy prog-only attachments.I don't have any attachment to the legacy apis, I'm supportingthemquoted
quoted
quoted
quoted
only because it takes two lines of code; we can go link-only iftherequoted
quoted
quoted
quoted
is an agreement that it's inherently better. How about I keep sys_bpf(BPF_PROG_QUERY) as is and I do a loopin thequoted
quoted
quoted
quoted
userspace (for BPF_LSM_CGROUP only) over all links (BPF_LINK_GET_NEXT_ID) and will find the the ones with matchingprogquoted
quoted
quoted
quoted
ids (BPF_LINK_GET_FD_BY_ID+BPF_OBJ_GET_INFO_BY_FD)? That way we keep new fields in bpf_link_info, but we don't have to extend sys_bpf(BPF_PROG_QUERY) because there doesn't seem to bea goodquoted
quoted
quoted
quoted
way to do it. Exporting links via new link_fds would mean we'dhave toquoted
quoted
quoted
quoted
support BPF_F_QUERY_EFFECTIVE, but getting an effective array oflinksquoted
quoted
quoted
quoted
seems to be messy. If, in the future, we figure out a better wayto I don't see a clean way to get effective array from one individual link[_info] through link iteration. effective array is the progs that will be run at a cgroup and in such order. The prog running at a cgroup doesn't necessarily linked to that cgroup.Yeah, that's the problem with exposing links via prog_info; getting an effective list is painful.quoted
If staying with BPF_PROG_QUERY+BPF_F_QUERY_EFFECTIVE to get effective array and if it is decided the addition should be done in bpf_link_info, then a list of link ids needs to be output instead of the current list of prog ids. The old attach type will still have to stay with the list of prog ids though :/quoted
It will be sad not to be able to get effective only for BPF_LSM_CGROUP. I found it more useful to show what will be run at a cgroup and in such order instead of what is linked to a cgroup.See my hacky proof-of-concept below (on top of this series).
yeah. the PoC makes sense and I don't mind that considering adding them to bpf_link_info (or bpf_prog_info) will be useful in general even without this use case. A quick thought is this is sort of partly going back to v6 but just iterating different things instead of the bpf_lsm hooks.
I think if we keep prog_info as is (don't export anything new, don't export the list of links), iterating through all links on the host should work, right? We get prog_ids list (effective or not, doesn't matter), then we go through all the links and find the ones with with the same prog_id (we can ignore cgroup, it shouldn't matter). Then we can export attach_type/attach_btf_id/etc. If it happens to be slow in the future, we can improve with some tbd interface to get the list of links for cgroup (and then we'd have to care about effective list). But the problem with going link-only is that I'd have to teach bpftool to use links for BPF_LSM_CGROUP and it brings a bunch of problems: * I'd have to pin those links somewhere to make them stick around * Those pin paths essentially become an API now because "detach" now depends on them? * (right now it automatically works with the legacy apis without any changes)
It is already the current API for all links (tracing, cgroup...). It goes away (detach) with the process unless it is pinned. but yeah, it will be a new exception in the "bpftool cgroup" subcommand only for BPF_LSM_CGROUP. If it is an issue with your use case, may be going back to v6 that extends the query bpf_attr with attach_btf_id and support both attach API ?