Re: [PATCH bpf-next v7 05/11] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP
From: Stanislav Fomichev <hidden>
Date: 2022-05-25 17:02:24
Also in:
bpf
On Wed, May 25, 2022 at 9:01 AM Stanislav Fomichev [off-list ref] wrote:
On Tue, May 24, 2022 at 9:39 PM Andrii Nakryiko [off-list ref] wrote:quoted
On Tue, May 24, 2022 at 9:03 PM Stanislav Fomichev [off-list ref] wrote:quoted
On Tue, May 24, 2022 at 4:45 PM Andrii Nakryiko [off-list ref] wrote:quoted
On Tue, May 24, 2022 at 10:50 AM Martin KaFai Lau [off-list ref] wrote:quoted
On Tue, May 24, 2022 at 08:55:04AM -0700, Stanislav Fomichev wrote:quoted
On Mon, May 23, 2022 at 8:49 PM Martin KaFai Lau [off-list ref] wrote:quoted
On Wed, May 18, 2022 at 03:55:25PM -0700, Stanislav Fomichev wrote:quoted
We have two options: 1. Treat all BPF_LSM_CGROUP the same, regardless of attach_btf_id 2. Treat BPF_LSM_CGROUP+attach_btf_id as a separate hook point I was doing (2) in the original patch, but switching to (1) here: * bpf_prog_query returns all attached BPF_LSM_CGROUP programs regardless of attach_btf_id * attach_btf_id is exported via bpf_prog_info Signed-off-by: Stanislav Fomichev <redacted> --- include/uapi/linux/bpf.h | 5 ++ kernel/bpf/cgroup.c | 103 +++++++++++++++++++++++++++------------ kernel/bpf/syscall.c | 4 +- 3 files changed, 81 insertions(+), 31 deletions(-)diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b9d2d6de63a7..432fc5f49567 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h@@ -1432,6 +1432,7 @@ union bpf_attr { __u32 attach_flags; __aligned_u64 prog_ids; __u32 prog_cnt; + __aligned_u64 prog_attach_flags; /* output: per-program attach_flags */ } query; struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */@@ -5911,6 +5912,10 @@ struct bpf_prog_info { __u64 run_cnt; __u64 recursion_misses; __u32 verified_insns; + /* BTF ID of the function to attach to within BTF object identified + * by btf_id. + */ + __u32 attach_btf_func_id; } __attribute__((aligned(8))); struct bpf_map_info {diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index a959cdd22870..08a1015ee09e 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c@@ -1074,6 +1074,7 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, union bpf_attr __user *uattr) { + __u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags); __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); enum bpf_attach_type type = attr->query.attach_type; enum cgroup_bpf_attach_type atype;@@ -1081,50 +1082,92 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, struct hlist_head *progs; struct bpf_prog *prog; int cnt, ret = 0, i; + int total_cnt = 0; u32 flags; - atype = to_cgroup_bpf_attach_type(type); - if (atype < 0) - return -EINVAL; + enum cgroup_bpf_attach_type from_atype, to_atype; - progs = &cgrp->bpf.progs[atype]; - flags = cgrp->bpf.flags[atype]; + if (type == BPF_LSM_CGROUP) { + from_atype = CGROUP_LSM_START; + to_atype = CGROUP_LSM_END; + } else { + from_atype = to_cgroup_bpf_attach_type(type); + if (from_atype < 0) + return -EINVAL; + to_atype = from_atype; + } - effective = rcu_dereference_protected(cgrp->bpf.effective[atype], - lockdep_is_held(&cgroup_mutex)); + for (atype = from_atype; atype <= to_atype; atype++) { + progs = &cgrp->bpf.progs[atype]; + flags = cgrp->bpf.flags[atype]; - if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) - cnt = bpf_prog_array_length(effective); - else - cnt = prog_list_length(progs); + effective = rcu_dereference_protected(cgrp->bpf.effective[atype], + lockdep_is_held(&cgroup_mutex)); - if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) - return -EFAULT; - if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) + if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) + total_cnt += bpf_prog_array_length(effective); + else + total_cnt += prog_list_length(progs); + } + + if (type != BPF_LSM_CGROUP) + if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) + return -EFAULT; + if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt))) return -EFAULT; - if (attr->query.prog_cnt == 0 || !prog_ids || !cnt) + if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt) /* return early if user requested only program count + flags */ return 0; - if (attr->query.prog_cnt < cnt) { - cnt = attr->query.prog_cnt; + + if (attr->query.prog_cnt < total_cnt) { + total_cnt = attr->query.prog_cnt; ret = -ENOSPC; } - if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) { - return bpf_prog_array_copy_to_user(effective, prog_ids, cnt); - } else { - struct bpf_prog_list *pl; - u32 id; + for (atype = from_atype; atype <= to_atype; atype++) { + if (total_cnt <= 0) + break; - i = 0; - hlist_for_each_entry(pl, progs, node) { - prog = prog_list_prog(pl); - id = prog->aux->id; - if (copy_to_user(prog_ids + i, &id, sizeof(id))) - return -EFAULT; - if (++i == cnt) - break; + progs = &cgrp->bpf.progs[atype]; + flags = cgrp->bpf.flags[atype]; + + effective = rcu_dereference_protected(cgrp->bpf.effective[atype], + lockdep_is_held(&cgroup_mutex)); + + if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) + cnt = bpf_prog_array_length(effective); + else + cnt = prog_list_length(progs); + + if (cnt >= total_cnt) + cnt = total_cnt; + + if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) { + ret = bpf_prog_array_copy_to_user(effective, prog_ids, cnt); + } else { + struct bpf_prog_list *pl; + u32 id; + + i = 0; + hlist_for_each_entry(pl, progs, node) { + prog = prog_list_prog(pl); + id = prog->aux->id; + if (copy_to_user(prog_ids + i, &id, sizeof(id))) + return -EFAULT; + if (++i == cnt) + break; + } } + + if (prog_attach_flags) + for (i = 0; i < cnt; i++) + if (copy_to_user(prog_attach_flags + i, &flags, sizeof(flags))) + return -EFAULT; + + prog_ids += cnt; + total_cnt -= cnt; + if (prog_attach_flags) + prog_attach_flags += cnt; } return ret; }diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 5ed2093e51cc..4137583c04a2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c@@ -3520,7 +3520,7 @@ static int bpf_prog_detach(const union bpf_attr *attr) } } -#define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt +#define BPF_PROG_QUERY_LAST_FIELD query.prog_attach_flags static int bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)@@ -3556,6 +3556,7 @@ static int bpf_prog_query(const union bpf_attr *attr, case BPF_CGROUP_SYSCTL: case BPF_CGROUP_GETSOCKOPT: case BPF_CGROUP_SETSOCKOPT: + case BPF_LSM_CGROUP: return cgroup_bpf_prog_query(attr, uattr); case BPF_LIRC_MODE2: return lirc_prog_query(attr, uattr);@@ -4066,6 +4067,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, if (prog->aux->btf) info.btf_id = btf_obj_id(prog->aux->btf); + info.attach_btf_func_id = prog->aux->attach_btf_id;Note that exposing prog->aux->attach_btf_id only may not be enough unless it can assume info.attach_btf_id is always referring to btf_vmlinux for all bpf prog types.We also export btf_id two lines above, right? Btw, I left a comment in the bpftool about those btf_ids, I'm not sure how resolve them and always assume vmlinux for now.yeah, that btf_id above is the cgroup-lsm prog's btf_id which has its func info, line info...etc. It is not the one the attach_btf_id correspond to. attach_btf_id refers to either aux->attach_btf or aux->dst_prog's btf (or target btf id here). It needs a consensus on where this attach_btf_id, target btf id, and prog_attach_flags should be. If I read the patch 7 thread correctly, I think Andrii is suggesting to expose them to userspace through link, so potentially putting them in bpf_link_info. The bpf_prog_query will output a list of link ids. The same probably applies toYep and I think it makes sense because link is representing one specific attachment (and I presume flags can be stored inside the link itself as well, right?). But if legacy non-link BPF_PROG_ATTACH is supported then using bpf_link_info won't cover legacy prog-only attachments.I don't have any attachment to the legacy apis, I'm supporting them only because it takes two lines of code; we can go link-only if there is an agreement that it's inherently better. How about I keep sys_bpf(BPF_PROG_QUERY) as is and I do a loop in the userspace (for BPF_LSM_CGROUP only) over all links (BPF_LINK_GET_NEXT_ID) and will find the the ones with matching prog ids (BPF_LINK_GET_FD_BY_ID+BPF_OBJ_GET_INFO_BY_FD)? That way we keep new fields in bpf_link_info, but we don't have to extend sys_bpf(BPF_PROG_QUERY) because there doesn't seem to be a good way to do it. Exporting links via new link_fds would mean we'd have to support BPF_F_QUERY_EFFECTIVE, but getting an effective array of links seems to be messy. If, in the future, we figure out a better way to expose a list of attached/effective links per cgroup, we can convert/optimize bpftool.Why not use iter/bpf_link program (see progs/bpf_iter_bpf_link.c for an example) instead? Once you have struct bpf_link and you know it's cgroup link, you can cast it to struct bpf_cgroup_link and get access to prog and cgroup. From cgroup to cgroup_bpf you can even get access to effective array. Basically whatever kernel has access to you can have access to from bpftool without extending any UAPIs.Seems a bit too involved just to read back the fields? I might as well use drgn? I'm also not sure about the implementation: will I be able to upcast bpf_link to bpf_cgroup_link in the bpf prog? And getting attach_type might be problematic from the iterator program as well: I need to call kernel's bpf_lsm_attach_type_get to find atype for attach_btf_id, I'd have to export it as kfunc?
I've prototyped whatever I've suggested above and there is another problem with going link-only: bpftool currently uses bpf_prog_attach unconditionally; we'd have to change that to use links for BPF_LSM_CGROUP (and pin them in some hard-coded locations?) :-( I'm leaning towards keeping those legacy apis around and exporting via prog_info; there doesn't seem to be a clear benefit :-(