Re: [PATCH bpf-next v7 05/11] bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP

From: Martin KaFai Lau <hidden>
Date: 2022-05-26 00:04:06
Also in: bpf

On Wed, May 25, 2022 at 02:25:54PM -0700, sdf@google.com wrote:

On 05/25, Martin KaFai Lau wrote:

quoted

On Wed, May 25, 2022 at 10:02:07AM -0700, Stanislav Fomichev wrote:

quoted

On Wed, May 25, 2022 at 9:01 AM Stanislav Fomichev [off-list ref]

wrote:

quoted

On Tue, May 24, 2022 at 9:39 PM Andrii Nakryiko
[off-list ref] wrote:

quoted

On Tue, May 24, 2022 at 9:03 PM Stanislav Fomichev

[off-list ref] wrote:

quoted

On Tue, May 24, 2022 at 4:45 PM Andrii Nakryiko
[off-list ref] wrote:

quoted

On Tue, May 24, 2022 at 10:50 AM Martin KaFai Lau

[off-list ref] wrote:

quoted

quoted
quoted
quoted
quoted
quoted
On Tue, May 24, 2022 at 08:55:04AM -0700, Stanislav Fomichev

wrote:

quoted

quoted
quoted
quoted
quoted
quoted
quoted
On Mon, May 23, 2022 at 8:49 PM Martin KaFai Lau

[off-list ref] wrote:

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
On Wed, May 18, 2022 at 03:55:25PM -0700, Stanislav

Fomichev wrote:

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
We have two options:
1. Treat all BPF_LSM_CGROUP the same, regardless of

attach_btf_id

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
2. Treat BPF_LSM_CGROUP+attach_btf_id as a separate

hook point

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
I was doing (2) in the original patch, but switching

to (1) here:

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
* bpf_prog_query returns all attached BPF_LSM_CGROUP

programs

quoted

regardless of attach_btf_id
* attach_btf_id is exported via bpf_prog_info

Signed-off-by: Stanislav Fomichev <redacted>
---
 include/uapi/linux/bpf.h |   5 ++
 kernel/bpf/cgroup.c      | 103

+++++++++++++++++++++++++++------------

quoted

 kernel/bpf/syscall.c     |   4 +-
 3 files changed, 81 insertions(+), 31 deletions(-)

diff --git a/include/uapi/linux/bpf.h

b/include/uapi/linux/bpf.h

quoted

index b9d2d6de63a7..432fc5f49567 100644

--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h

@@ -1432,6 +1432,7 @@ union bpf_attr {
              __u32           attach_flags;
              __aligned_u64   prog_ids;
              __u32           prog_cnt;
+             __aligned_u64   prog_attach_flags; /*

output: per-program attach_flags */

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      } query;

      struct { /* anonymous struct used by

BPF_RAW_TRACEPOINT_OPEN command */

quoted

@@ -5911,6 +5912,10 @@ struct bpf_prog_info {
      __u64 run_cnt;
      __u64 recursion_misses;
      __u32 verified_insns;
+     /* BTF ID of the function to attach to within

BTF object identified

quoted

+      * by btf_id.
+      */
+     __u32 attach_btf_func_id;
 } __attribute__((aligned(8)));

 struct bpf_map_info {

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a959cdd22870..08a1015ee09e 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c

@@ -1074,6 +1074,7 @@ static int

cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
 static int __cgroup_bpf_query(struct cgroup *cgrp,

const union bpf_attr *attr,

quoted

                            union bpf_attr __user

*uattr)

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
 {
+     __u32 __user *prog_attach_flags =

u64_to_user_ptr(attr->query.prog_attach_flags);

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      __u32 __user *prog_ids =

u64_to_user_ptr(attr->query.prog_ids);

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      enum bpf_attach_type type =

attr->query.attach_type;

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      enum cgroup_bpf_attach_type atype;
@@ -1081,50 +1082,92 @@ static int

__cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,

quoted

      struct hlist_head *progs;
      struct bpf_prog *prog;
      int cnt, ret = 0, i;
+     int total_cnt = 0;
      u32 flags;

-     atype = to_cgroup_bpf_attach_type(type);
-     if (atype < 0)
-             return -EINVAL;
+     enum cgroup_bpf_attach_type from_atype, to_atype;

-     progs = &cgrp->bpf.progs[atype];
-     flags = cgrp->bpf.flags[atype];
+     if (type == BPF_LSM_CGROUP) {
+             from_atype = CGROUP_LSM_START;
+             to_atype = CGROUP_LSM_END;
+     } else {
+             from_atype =

to_cgroup_bpf_attach_type(type);

quoted

+             if (from_atype < 0)
+                     return -EINVAL;
+             to_atype = from_atype;
+     }

-     effective =

rcu_dereference_protected(cgrp->bpf.effective[atype],

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-

lockdep_is_held(&cgroup_mutex));

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+     for (atype = from_atype; atype <= to_atype;

atype++) {

quoted

+             progs = &cgrp->bpf.progs[atype];
+             flags = cgrp->bpf.flags[atype];

-     if (attr->query.query_flags &

BPF_F_QUERY_EFFECTIVE)

quoted

-             cnt = bpf_prog_array_length(effective);
-     else
-             cnt = prog_list_length(progs);
+             effective =

rcu_dereference_protected(cgrp->bpf.effective[atype],

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+

lockdep_is_held(&cgroup_mutex));

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-     if (copy_to_user(&uattr->query.attach_flags,

&flags, sizeof(flags)))

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-             return -EFAULT;
-     if (copy_to_user(&uattr->query.prog_cnt, &cnt,

sizeof(cnt)))

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+             if (attr->query.query_flags &

BPF_F_QUERY_EFFECTIVE)

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                     total_cnt +=

bpf_prog_array_length(effective);

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+             else
+                     total_cnt +=

prog_list_length(progs);

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+     }
+
+     if (type != BPF_LSM_CGROUP)
+             if

(copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                     return -EFAULT;
+     if (copy_to_user(&uattr->query.prog_cnt,

&total_cnt, sizeof(total_cnt)))

quoted

              return -EFAULT;
-     if (attr->query.prog_cnt == 0 || !prog_ids ||

!cnt)

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+     if (attr->query.prog_cnt == 0 || !prog_ids ||

!total_cnt)

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
              /* return early if user requested only

program count + flags */

quoted

              return 0;
-     if (attr->query.prog_cnt < cnt) {
-             cnt = attr->query.prog_cnt;
+
+     if (attr->query.prog_cnt < total_cnt) {
+             total_cnt = attr->query.prog_cnt;
              ret = -ENOSPC;
      }

-     if (attr->query.query_flags &

BPF_F_QUERY_EFFECTIVE) {

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-             return

bpf_prog_array_copy_to_user(effective, prog_ids, cnt);

quoted

-     } else {
-             struct bpf_prog_list *pl;
-             u32 id;
+     for (atype = from_atype; atype <= to_atype;

atype++) {

quoted

+             if (total_cnt <= 0)
+                     break;

-             i = 0;
-             hlist_for_each_entry(pl, progs, node) {
-                     prog = prog_list_prog(pl);
-                     id = prog->aux->id;
-                     if (copy_to_user(prog_ids + i,

&id, sizeof(id)))

quoted

-                             return -EFAULT;
-                     if (++i == cnt)
-                             break;
+             progs = &cgrp->bpf.progs[atype];
+             flags = cgrp->bpf.flags[atype];
+
+             effective =

rcu_dereference_protected(cgrp->bpf.effective[atype],

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+

lockdep_is_held(&cgroup_mutex));

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+
+             if (attr->query.query_flags &

BPF_F_QUERY_EFFECTIVE)

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                     cnt =

bpf_prog_array_length(effective);

quoted

+             else
+                     cnt = prog_list_length(progs);
+
+             if (cnt >= total_cnt)
+                     cnt = total_cnt;
+
+             if (attr->query.query_flags &

BPF_F_QUERY_EFFECTIVE) {

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
+                     ret =

bpf_prog_array_copy_to_user(effective, prog_ids, cnt);

quoted

+             } else {
+                     struct bpf_prog_list *pl;
+                     u32 id;
+
+                     i = 0;
+                     hlist_for_each_entry(pl, progs,

node) {

quoted

+                             prog = prog_list_prog(pl);
+                             id = prog->aux->id;
+                             if

(copy_to_user(prog_ids + i, &id, sizeof(id)))

quoted

+                                     return -EFAULT;
+                             if (++i == cnt)
+                                     break;
+                     }
              }
+
+             if (prog_attach_flags)
+                     for (i = 0; i < cnt; i++)
+                             if

(copy_to_user(prog_attach_flags + i, &flags, sizeof(flags)))

quoted

+                                     return -EFAULT;
+
+             prog_ids += cnt;
+             total_cnt -= cnt;
+             if (prog_attach_flags)
+                     prog_attach_flags += cnt;
      }
      return ret;
 }

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5ed2093e51cc..4137583c04a2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c

@@ -3520,7 +3520,7 @@ static int bpf_prog_detach(const

union bpf_attr *attr)

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
      }
 }

-#define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt
+#define BPF_PROG_QUERY_LAST_FIELD

query.prog_attach_flags

quoted

 static int bpf_prog_query(const union bpf_attr *attr,
                        union bpf_attr __user *uattr)

@@ -3556,6 +3556,7 @@ static int bpf_prog_query(const

union bpf_attr *attr,

quoted

      case BPF_CGROUP_SYSCTL:
      case BPF_CGROUP_GETSOCKOPT:
      case BPF_CGROUP_SETSOCKOPT:
+     case BPF_LSM_CGROUP:
              return cgroup_bpf_prog_query(attr, uattr);
      case BPF_LIRC_MODE2:
              return lirc_prog_query(attr, uattr);

@@ -4066,6 +4067,7 @@ static int

bpf_prog_get_info_by_fd(struct file *file,

quoted

      if (prog->aux->btf)
              info.btf_id = btf_obj_id(prog->aux->btf);
+     info.attach_btf_func_id =

prog->aux->attach_btf_id;

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
Note that exposing prog->aux->attach_btf_id only may not

be enough

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
unless it can assume info.attach_btf_id is always

referring to btf_vmlinux

quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
for all bpf prog types.
We also export btf_id two lines above, right? Btw, I left

a comment in

quoted

quoted
quoted
quoted
quoted
quoted
quoted
the bpftool about those btf_ids, I'm not sure how resolve

them and

quoted

always assume vmlinux for now.

yeah, that btf_id above is the cgroup-lsm prog's btf_id

which has its

quoted

quoted
quoted
quoted
quoted
quoted
func info, line info...etc.   It is not the one the

attach_btf_id correspond

quoted

quoted
quoted
quoted
quoted
quoted
to.  attach_btf_id refers to either aux->attach_btf or

aux->dst_prog's btf (or

quoted

target btf id here).

It needs a consensus on where this attach_btf_id, target btf

id, and

quoted

quoted
quoted
quoted
quoted
quoted
prog_attach_flags should be.  If I read the patch 7 thread

correctly,

quoted

quoted
quoted
quoted
quoted
quoted
I think Andrii is suggesting to expose them to userspace

through link, so

quoted

quoted
quoted
quoted
quoted
quoted
potentially putting them in bpf_link_info.  The

bpf_prog_query will

quoted

output a list of link ids.  The same probably applies to

Yep and I think it makes sense because link is representing one
specific attachment (and I presume flags can be stored inside

the link

quoted

itself as well, right?).

But if legacy non-link BPF_PROG_ATTACH is supported then using
bpf_link_info won't cover legacy prog-only attachments.

I don't have any attachment to the legacy apis, I'm supporting

them

quoted

only because it takes two lines of code; we can go link-only if

there

quoted

is an agreement that it's inherently better.

How about I keep sys_bpf(BPF_PROG_QUERY) as is and I do a loop

in the

quoted

userspace (for BPF_LSM_CGROUP only) over all links
(BPF_LINK_GET_NEXT_ID) and will find the the ones with matching

prog

quoted

ids (BPF_LINK_GET_FD_BY_ID+BPF_OBJ_GET_INFO_BY_FD)?

That way we keep new fields in bpf_link_info, but we don't have to
extend sys_bpf(BPF_PROG_QUERY) because there doesn't seem to be

a good

quoted

way to do it. Exporting links via new link_fds would mean we'd

have to

quoted

support BPF_F_QUERY_EFFECTIVE, but getting an effective array of

links

quoted

seems to be messy. If, in the future, we figure out a better way

to
I don't see a clean way to get effective array from one individual
link[_info] through link iteration.  effective array is the progs that
will be run at a cgroup and in such order.  The prog running at a
cgroup doesn't necessarily linked to that cgroup.

Yeah, that's the problem with exposing links via prog_info; getting an
effective list is painful.

quoted

If staying with BPF_PROG_QUERY+BPF_F_QUERY_EFFECTIVE to get effective
array
and if it is decided the addition should be done in bpf_link_info,
then a list of link ids needs to be output instead of the current list of
prog ids.  The old attach type will still have to stay with the list of
prog ids though :/

quoted

It will be sad not to be able to get effective only for BPF_LSM_CGROUP.
I found it more useful to show what will be run at a cgroup and in such
order instead of what is linked to a cgroup.

See my hacky proof-of-concept below (on top of this series).

yeah. the PoC makes sense and I don't mind that considering
adding them to bpf_link_info (or bpf_prog_info) will be useful in
general even without this use case.

A quick thought is this is sort of partly going back to v6 but
just iterating different things instead of the bpf_lsm hooks.

I think if we keep prog_info as is (don't export anything new, don't
export the list of links), iterating through all links on the host should
work,
right? We get prog_ids list (effective or not, doesn't matter), then we
go through all the links and find the ones with with the same
prog_id (we can ignore cgroup, it shouldn't matter). Then we can export
attach_type/attach_btf_id/etc. If it happens to be slow in the future,
we can improve with some tbd interface to get the list of links for cgroup
(and then we'd have to care about effective list).

But the problem with going link-only is that I'd have to teach bpftool
to use links for BPF_LSM_CGROUP and it brings a bunch of problems:
* I'd have to pin those links somewhere to make them stick around
* Those pin paths essentially become an API now because "detach" now
  depends on them?
* (right now it automatically works with the legacy apis without any
changes)

It is already the current API for all links (tracing, cgroup...).  It goes
away (detach) with the process unless it is pinned.  but yeah, it will
be a new exception in the "bpftool cgroup" subcommand only for
BPF_LSM_CGROUP.

If it is an issue with your use case, may be going back to v6 that extends
the query bpf_attr with attach_btf_id and support both attach API ?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help