Thread (54 messages) 54 messages, 5 authors, 2022-05-31

Re: [PATCH bpf-next v7 05/11] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP

From: Martin KaFai Lau <hidden>
Date: 2022-05-26 00:04:06
Also in: bpf

On Wed, May 25, 2022 at 02:25:54PM -0700, sdf@google.com wrote:
On 05/25, Martin KaFai Lau wrote:
quoted
On Wed, May 25, 2022 at 10:02:07AM -0700, Stanislav Fomichev wrote:
quoted
On Wed, May 25, 2022 at 9:01 AM Stanislav Fomichev [off-list ref]
wrote:
quoted
quoted
On Tue, May 24, 2022 at 9:39 PM Andrii Nakryiko
[off-list ref] wrote:
quoted
On Tue, May 24, 2022 at 9:03 PM Stanislav Fomichev
[off-list ref] wrote:
quoted
quoted
quoted
quoted
On Tue, May 24, 2022 at 4:45 PM Andrii Nakryiko
[off-list ref] wrote:
quoted
On Tue, May 24, 2022 at 10:50 AM Martin KaFai Lau
[off-list ref] wrote:
quoted
quoted
quoted
quoted
quoted
quoted
On Tue, May 24, 2022 at 08:55:04AM -0700, Stanislav Fomichev
wrote:
quoted
quoted
quoted
quoted
quoted
quoted
quoted
On Mon, May 23, 2022 at 8:49 PM Martin KaFai Lau
[off-list ref] wrote:
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
On Wed, May 18, 2022 at 03:55:25PM -0700, Stanislav
Fomichev wrote:
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
We have two options:
1. Treat all BPF_LSM_CGROUP the same, regardless of
attach_btf_id
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
2. Treat BPF_LSM_CGROUP+attach_btf_id as a separate
hook point
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
I was doing (2) in the original patch, but switching
to (1) here:
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
* bpf_prog_query returns all attached BPF_LSM_CGROUP
programs
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
regardless of attach_btf_id
* attach_btf_id is exported via bpf_prog_info

Signed-off-by: Stanislav Fomichev <redacted>
---
 include/uapi/linux/bpf.h |   5 ++
 kernel/bpf/cgroup.c      | 103
+++++++++++++++++++++++++++------------
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
 kernel/bpf/syscall.c     |   4 +-
 3 files changed, 81 insertions(+), 31 deletions(-)
diff --git a/include/uapi/linux/bpf.h
b/include/uapi/linux/bpf.h
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
index b9d2d6de63a7..432fc5f49567 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1432,6 +1432,7 @@ union bpf_attr {
              __u32           attach_flags;
              __aligned_u64   prog_ids;
              __u32           prog_cnt;
+             __aligned_u64   prog_attach_flags; /*
output: per-program attach_flags */
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      } query;

      struct { /* anonymous struct used by
BPF_RAW_TRACEPOINT_OPEN command */
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
@@ -5911,6 +5912,10 @@ struct bpf_prog_info {
      __u64 run_cnt;
      __u64 recursion_misses;
      __u32 verified_insns;
+     /* BTF ID of the function to attach to within
BTF object identified
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+      * by btf_id.
+      */
+     __u32 attach_btf_func_id;
 } __attribute__((aligned(8)));

 struct bpf_map_info {
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a959cdd22870..08a1015ee09e 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1074,6 +1074,7 @@ static int
cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
 static int __cgroup_bpf_query(struct cgroup *cgrp,
const union bpf_attr *attr,
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
                            union bpf_attr __user
*uattr)
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
 {
+     __u32 __user *prog_attach_flags =
u64_to_user_ptr(attr->query.prog_attach_flags);
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      __u32 __user *prog_ids =
u64_to_user_ptr(attr->query.prog_ids);
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      enum bpf_attach_type type =
attr->query.attach_type;
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      enum cgroup_bpf_attach_type atype;
@@ -1081,50 +1082,92 @@ static int
__cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      struct hlist_head *progs;
      struct bpf_prog *prog;
      int cnt, ret = 0, i;
+     int total_cnt = 0;
      u32 flags;

-     atype = to_cgroup_bpf_attach_type(type);
-     if (atype < 0)
-             return -EINVAL;
+     enum cgroup_bpf_attach_type from_atype, to_atype;

-     progs = &cgrp->bpf.progs[atype];
-     flags = cgrp->bpf.flags[atype];
+     if (type == BPF_LSM_CGROUP) {
+             from_atype = CGROUP_LSM_START;
+             to_atype = CGROUP_LSM_END;
+     } else {
+             from_atype =
to_cgroup_bpf_attach_type(type);
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+             if (from_atype < 0)
+                     return -EINVAL;
+             to_atype = from_atype;
+     }

-     effective =
rcu_dereference_protected(cgrp->bpf.effective[atype],
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-
lockdep_is_held(&cgroup_mutex));
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+     for (atype = from_atype; atype <= to_atype;
atype++) {
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+             progs = &cgrp->bpf.progs[atype];
+             flags = cgrp->bpf.flags[atype];

-     if (attr->query.query_flags &
BPF_F_QUERY_EFFECTIVE)
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-             cnt = bpf_prog_array_length(effective);
-     else
-             cnt = prog_list_length(progs);
+             effective =
rcu_dereference_protected(cgrp->bpf.effective[atype],
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+
lockdep_is_held(&cgroup_mutex));
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-     if (copy_to_user(&uattr->query.attach_flags,
&flags, sizeof(flags)))
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-             return -EFAULT;
-     if (copy_to_user(&uattr->query.prog_cnt, &cnt,
sizeof(cnt)))
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+             if (attr->query.query_flags &
BPF_F_QUERY_EFFECTIVE)
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                     total_cnt +=
bpf_prog_array_length(effective);
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+             else
+                     total_cnt +=
prog_list_length(progs);
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+     }
+
+     if (type != BPF_LSM_CGROUP)
+             if
(copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                     return -EFAULT;
+     if (copy_to_user(&uattr->query.prog_cnt,
&total_cnt, sizeof(total_cnt)))
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
              return -EFAULT;
-     if (attr->query.prog_cnt == 0 || !prog_ids ||
!cnt)
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+     if (attr->query.prog_cnt == 0 || !prog_ids ||
!total_cnt)
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
              /* return early if user requested only
program count + flags */
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
              return 0;
-     if (attr->query.prog_cnt < cnt) {
-             cnt = attr->query.prog_cnt;
+
+     if (attr->query.prog_cnt < total_cnt) {
+             total_cnt = attr->query.prog_cnt;
              ret = -ENOSPC;
      }

-     if (attr->query.query_flags &
BPF_F_QUERY_EFFECTIVE) {
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-             return
bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-     } else {
-             struct bpf_prog_list *pl;
-             u32 id;
+     for (atype = from_atype; atype <= to_atype;
atype++) {
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+             if (total_cnt <= 0)
+                     break;

-             i = 0;
-             hlist_for_each_entry(pl, progs, node) {
-                     prog = prog_list_prog(pl);
-                     id = prog->aux->id;
-                     if (copy_to_user(prog_ids + i,
&id, sizeof(id)))
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-                             return -EFAULT;
-                     if (++i == cnt)
-                             break;
+             progs = &cgrp->bpf.progs[atype];
+             flags = cgrp->bpf.flags[atype];
+
+             effective =
rcu_dereference_protected(cgrp->bpf.effective[atype],
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+
lockdep_is_held(&cgroup_mutex));
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+
+             if (attr->query.query_flags &
BPF_F_QUERY_EFFECTIVE)
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                     cnt =
bpf_prog_array_length(effective);
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+             else
+                     cnt = prog_list_length(progs);
+
+             if (cnt >= total_cnt)
+                     cnt = total_cnt;
+
+             if (attr->query.query_flags &
BPF_F_QUERY_EFFECTIVE) {
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                     ret =
bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+             } else {
+                     struct bpf_prog_list *pl;
+                     u32 id;
+
+                     i = 0;
+                     hlist_for_each_entry(pl, progs,
node) {
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                             prog = prog_list_prog(pl);
+                             id = prog->aux->id;
+                             if
(copy_to_user(prog_ids + i, &id, sizeof(id)))
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                                     return -EFAULT;
+                             if (++i == cnt)
+                                     break;
+                     }
              }
+
+             if (prog_attach_flags)
+                     for (i = 0; i < cnt; i++)
+                             if
(copy_to_user(prog_attach_flags + i, &flags, sizeof(flags)))
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                                     return -EFAULT;
+
+             prog_ids += cnt;
+             total_cnt -= cnt;
+             if (prog_attach_flags)
+                     prog_attach_flags += cnt;
      }
      return ret;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5ed2093e51cc..4137583c04a2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3520,7 +3520,7 @@ static int bpf_prog_detach(const
union bpf_attr *attr)
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      }
 }

-#define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt
+#define BPF_PROG_QUERY_LAST_FIELD
query.prog_attach_flags
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
 static int bpf_prog_query(const union bpf_attr *attr,
                        union bpf_attr __user *uattr)
@@ -3556,6 +3556,7 @@ static int bpf_prog_query(const
union bpf_attr *attr,
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      case BPF_CGROUP_SYSCTL:
      case BPF_CGROUP_GETSOCKOPT:
      case BPF_CGROUP_SETSOCKOPT:
+     case BPF_LSM_CGROUP:
              return cgroup_bpf_prog_query(attr, uattr);
      case BPF_LIRC_MODE2:
              return lirc_prog_query(attr, uattr);
@@ -4066,6 +4067,7 @@ static int
bpf_prog_get_info_by_fd(struct file *file,
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      if (prog->aux->btf)
              info.btf_id = btf_obj_id(prog->aux->btf);
+     info.attach_btf_func_id =
prog->aux->attach_btf_id;
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
Note that exposing prog->aux->attach_btf_id only may not
be enough
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
unless it can assume info.attach_btf_id is always
referring to btf_vmlinux
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
for all bpf prog types.
We also export btf_id two lines above, right? Btw, I left
a comment in
quoted
quoted
quoted
quoted
quoted
quoted
quoted
the bpftool about those btf_ids, I'm not sure how resolve
them and
quoted
quoted
quoted
quoted
quoted
quoted
quoted
always assume vmlinux for now.
yeah, that btf_id above is the cgroup-lsm prog's btf_id
which has its
quoted
quoted
quoted
quoted
quoted
quoted
func info, line info...etc.   It is not the one the
attach_btf_id correspond
quoted
quoted
quoted
quoted
quoted
quoted
to.  attach_btf_id refers to either aux->attach_btf or
aux->dst_prog's btf (or
quoted
quoted
quoted
quoted
quoted
quoted
target btf id here).

It needs a consensus on where this attach_btf_id, target btf
id, and
quoted
quoted
quoted
quoted
quoted
quoted
prog_attach_flags should be.  If I read the patch 7 thread
correctly,
quoted
quoted
quoted
quoted
quoted
quoted
I think Andrii is suggesting to expose them to userspace
through link, so
quoted
quoted
quoted
quoted
quoted
quoted
potentially putting them in bpf_link_info.  The
bpf_prog_query will
quoted
quoted
quoted
quoted
quoted
quoted
output a list of link ids.  The same probably applies to
Yep and I think it makes sense because link is representing one
specific attachment (and I presume flags can be stored inside
the link
quoted
quoted
quoted
quoted
quoted
itself as well, right?).

But if legacy non-link BPF_PROG_ATTACH is supported then using
bpf_link_info won't cover legacy prog-only attachments.
I don't have any attachment to the legacy apis, I'm supporting
them
quoted
quoted
quoted
quoted
only because it takes two lines of code; we can go link-only if
there
quoted
quoted
quoted
quoted
is an agreement that it's inherently better.

How about I keep sys_bpf(BPF_PROG_QUERY) as is and I do a loop
in the
quoted
quoted
quoted
quoted
userspace (for BPF_LSM_CGROUP only) over all links
(BPF_LINK_GET_NEXT_ID) and will find the the ones with matching
prog
quoted
quoted
quoted
quoted
ids (BPF_LINK_GET_FD_BY_ID+BPF_OBJ_GET_INFO_BY_FD)?

That way we keep new fields in bpf_link_info, but we don't have to
extend sys_bpf(BPF_PROG_QUERY) because there doesn't seem to be
a good
quoted
quoted
quoted
quoted
way to do it. Exporting links via new link_fds would mean we'd
have to
quoted
quoted
quoted
quoted
support BPF_F_QUERY_EFFECTIVE, but getting an effective array of
links
quoted
quoted
quoted
quoted
seems to be messy. If, in the future, we figure out a better way
to
I don't see a clean way to get effective array from one individual
link[_info] through link iteration.  effective array is the progs that
will be run at a cgroup and in such order.  The prog running at a
cgroup doesn't necessarily linked to that cgroup.
Yeah, that's the problem with exposing links via prog_info; getting an
effective list is painful.
quoted
If staying with BPF_PROG_QUERY+BPF_F_QUERY_EFFECTIVE to get effective
array
and if it is decided the addition should be done in bpf_link_info,
then a list of link ids needs to be output instead of the current list of
prog ids.  The old attach type will still have to stay with the list of
prog ids though :/
quoted
It will be sad not to be able to get effective only for BPF_LSM_CGROUP.
I found it more useful to show what will be run at a cgroup and in such
order instead of what is linked to a cgroup.
See my hacky proof-of-concept below (on top of this series).
yeah. the PoC makes sense and I don't mind that considering
adding them to bpf_link_info (or bpf_prog_info) will be useful in
general even without this use case.

A quick thought is this is sort of partly going back to v6 but
just iterating different things instead of the bpf_lsm hooks.
I think if we keep prog_info as is (don't export anything new, don't
export the list of links), iterating through all links on the host should
work,
right? We get prog_ids list (effective or not, doesn't matter), then we
go through all the links and find the ones with with the same
prog_id (we can ignore cgroup, it shouldn't matter). Then we can export
attach_type/attach_btf_id/etc. If it happens to be slow in the future,
we can improve with some tbd interface to get the list of links for cgroup
(and then we'd have to care about effective list).

But the problem with going link-only is that I'd have to teach bpftool
to use links for BPF_LSM_CGROUP and it brings a bunch of problems:
* I'd have to pin those links somewhere to make them stick around
* Those pin paths essentially become an API now because "detach" now
  depends on them?
* (right now it automatically works with the legacy apis without any
changes)
It is already the current API for all links (tracing, cgroup...).  It goes
away (detach) with the process unless it is pinned.  but yeah, it will
be a new exception in the "bpftool cgroup" subcommand only for
BPF_LSM_CGROUP.

If it is an issue with your use case, may be going back to v6 that extends
the query bpf_attr with attach_btf_id and support both attach API ?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help