Thread (11 messages) 11 messages, 5 authors, 2018-05-25

Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

From: Y Song <hidden>
Date: 2018-05-25 16:29:06
Also in: cgroups, lkml

On Fri, May 25, 2018 at 8:21 AM, Alban Crequy [off-list ref] wrote:
On Wed, May 23, 2018 at 4:34 AM Y Song [off-list ref] wrote:
quoted
I did a quick prototyping and the above interface seems working fine.
Thanks! I gave your kernel patch & userspace program a try and it works for
me on cgroup-v2.

Also, I found out how to get my containers to use both cgroup-v1 and
cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's
'--exec-opt native.cgroupdriver=systemd' option). So I should be able to
use the BPF helper function without having to add support for all the
cgroup-v1 hierarchies.
Great. Will submit a formal patch soon.
quoted
The kernel change:
===============
quoted
[yhs@localhost bpf-next]$ git diff
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 97446bbe2ca5..669b7383fddb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1976,7 +1976,8 @@ union bpf_attr {
         FN(fib_lookup),                 \
         FN(sock_hash_update),           \
         FN(msg_redirect_hash),          \
-       FN(sk_redirect_hash),
+       FN(sk_redirect_hash),           \
+       FN(get_current_cgroup_id),
quoted
  /* integer value in 'imm' field of BPF_CALL instruction selects which
helper
quoted
   * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ce2cbbff27e4..e11e3298f911 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -493,6 +493,21 @@ static const struct bpf_func_proto
bpf_current_task_under_cgroup_proto = {
         .arg2_type      = ARG_ANYTHING,
  };
quoted
+BPF_CALL_0(bpf_get_current_cgroup_id)
+{
+       struct cgroup *cgrp = task_dfl_cgroup(current);
+       if (!cgrp)
+               return -EINVAL;
+
+       return cgrp->kn->id.id;
+}
+
+static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
+       .func           = bpf_get_current_cgroup_id,
+       .gpl_only       = false,
+       .ret_type       = RET_INTEGER,
+};
+
  BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
            const void *, unsafe_ptr)
  {
@@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
struct bpf_prog *prog)
                 return &bpf_get_prandom_u32_proto;
         case BPF_FUNC_probe_read_str:
                 return &bpf_probe_read_str_proto;
+       case BPF_FUNC_get_current_cgroup_id:
+               return &bpf_get_current_cgroup_id_proto;
         default:
                 return NULL;
         }
quoted
The following program can be used to print out a cgroup id given a cgroup
path.
quoted
[yhs@localhost cg]$ cat get_cgroup_id.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
quoted
int main(int argc, char **argv)
{
     int dirfd, err, flags, mount_id, fhsize;
     struct file_handle *fhp;
     char *pathname;
quoted
     if (argc != 2) {
         printf("usage: %s <cgroup_path>\n", argv[0]);
         return 1;
     }
quoted
     pathname = argv[1];
     dirfd = AT_FDCWD;
     flags = 0;
quoted
     fhsize = sizeof(*fhp);
     fhp = malloc(fhsize);
     if (!fhp)
         return 1;
quoted
     err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
     if (err >= 0) {
         printf("error\n");
         return 1;
     }
quoted
     fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
     fhp = realloc(fhp, fhsize);
     if (!fhp)
         return 1;
quoted
     err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
     if (err < 0)
         perror("name_to_handle_at");
     else {
         int i;
quoted
         printf("dir = %s, mount_id = %d\n", pathname, mount_id);
         printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
             fhp->handle_type);
         if (fhp->handle_bytes != 8)
             return 1;
quoted
         printf("cgroup_id = 0x%llx\n", *(unsigned long long
*)fhp->f_handle);
quoted
     }
quoted
     return 0;
}
[yhs@localhost cg]$
quoted
Given a cgroup path, the user can get cgroup_id and use it in their bpf
program for filtering purpose.
quoted
I run a simple program t.c
    int main() { while(1) sleep(1); return 0; }
in the cgroup v2 directory /home/yhs/tmp/yhs
    none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel)
quoted
$ ./get_cgroup_id /home/yhs/tmp/yhs
dir = /home/yhs/tmp/yhs, mount_id = 124
handle_bytes = 8, handle_type = 1
cgroup_id = 0x1000006b2
quoted
// the below command to get cgroup_id from the kernel for the
// process compiled with t.c and ran under /home/yhs/tmp/yhs:
$ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid'
PID     TID     COMM            FUNC             -
4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
^C[yhs@localhost tools]$
quoted
The kernel and user space cgid matches. Will provide a
formal patch later.


quoted
On Mon, May 21, 2018 at 5:24 PM, Y Song [off-list ref] wrote:
quoted
On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
[off-list ref] wrote:
quoted
On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
quoted
+BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
+{
+     // TODO: pick the correct hierarchy instead of the mem
controller
quoted
quoted
quoted
quoted
+     struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
+
+     if (unlikely(!cgrp))
+             return -EINVAL;
+     if (unlikely(hierarchy))
+             return -EINVAL;
+     if (unlikely(flags))
+             return -EINVAL;
+
+     return cgrp->kn->id.ino;
ino only is not enough to identify cgroup. It needs generation number
too.
quoted
quoted
quoted
I don't quite see how hierarchy and flags can be used in the future.
Also why limit it to memcg?

How about something like this instead:

BPF_CALL_2(bpf_get_current_cgroup_id)
{
        struct cgroup *cgrp = task_dfl_cgroup(current);

        return cgrp->kn->id.id;
}
The user space can use fhandle api to get the same 64-bit id.
I think this should work. This will also be useful to bcc as user
space can encode desired id
in the bpf program and compared that id to the current cgroup id, so we
can have
quoted
quoted
cgroup level tracing (esp. stat collection) support. To cope with
cgroup hierarchy, user can use
cgroup-array based approach or explicitly compare against multiple
cgroup id's.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help