Thread (24 messages) 24 messages, 3 authors, 2022-08-03

Re: [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map

From: Yafang Shao <hidden>
Date: 2022-08-02 13:47:41
Also in: bpf, linux-mm

On Tue, Aug 2, 2022 at 12:55 PM Alexei Starovoitov
[off-list ref] wrote:
On Fri, Jul 29, 2022 at 03:23:16PM +0000, Yafang Shao wrote:
quoted
A new member memcg_fd is introduced into bpf attr of BPF_MAP_CREATE
command, which is the fd of an opened cgroup directory. In this cgroup,
the memory subsystem must be enabled. This value is valid only when
BPF_F_SELECTABLE_MEMCG is set in map_flags. Once the kernel get the
memory cgroup from this fd, it will set this memcg into bpf map, then
all the subsequent memory allocation of this map will be charge to the
memcg.

The map creation paths in libbpf are also changed consequently.

Currently it is only supported for cgroup2 directory.

The usage of this new member as follows,
      struct bpf_map_create_opts map_opts = {
              .sz = sizeof(map_opts),
              .map_flags = BPF_F_SELECTABLE_MEMCG,
      };
      int memcg_fd, int map_fd;
      int key, value;

      memcg_fd = open("/cgroup2", O_DIRECTORY);
      if (memcg_fd < 0) {
              perror("memcg dir open");
              return -1;
      }

      map_opts.memcg_fd = memcg_fd;
      map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "map_for_memcg",
                              sizeof(key), sizeof(value),
                              1024, &map_opts);
      if (map_fd <= 0) {
              perror("map create");
              return -1;
      }
Overall the api extension makes sense.
The flexibility of selecting memcg is useful.
Thanks!
quoted
Signed-off-by: Yafang Shao <redacted>
---
 include/uapi/linux/bpf.h       |  2 ++
 kernel/bpf/syscall.c           | 47 ++++++++++++++++++++++++++--------
 tools/include/uapi/linux/bpf.h |  2 ++
 tools/lib/bpf/bpf.c            |  1 +
 tools/lib/bpf/bpf.h            |  3 ++-
 tools/lib/bpf/libbpf.c         |  2 ++
 6 files changed, 46 insertions(+), 11 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d5fc1ea70b59..a6e02c8be924 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1296,6 +1296,8 @@ union bpf_attr {
                                                 * struct stored as the
                                                 * map value
                                                 */
+             __s32   memcg_fd;       /* selectable memcg */
+             __s32   :32;            /* hole */
new fields cannot be inserted in the middle of uapi struct.
There's a "#define BPF_MAP_CREATE_LAST_FIELD map_extra" in
kernel/bpf/syscall.c, and thus I thought it may have some special
meaning, so I put the new field above it.
Now that it doesn't have any special meaning, I will change it as you suggested.
quoted
              /* Any per-map-type extra fields
               *
               * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6401cc417fa9..9900e2b87315 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -402,14 +402,30 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 }

 #ifdef CONFIG_MEMCG_KMEM
-static void bpf_map_save_memcg(struct bpf_map *map)
+static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
 {
-     /* Currently if a map is created by a process belonging to the root
-      * memory cgroup, get_obj_cgroup_from_current() will return NULL.
-      * So we have to check map->objcg for being NULL each time it's
-      * being used.
-      */
-     map->objcg = get_obj_cgroup_from_current();
+     struct obj_cgroup *objcg;
+     struct cgroup *cgrp;
+
+     if (attr->map_flags & BPF_F_SELECTABLE_MEMCG) {
The flag is unnecessary. Just add memcg_fd to the end of attr and use != 0
as a condition that it should be used instead of get_obj_cgroup_from_current().
There are other parts of bpf uapi that have similar fd handling logic.
Right. There's a ensure_good_fd() to make the fd a positive number.
I will change it.
quoted
+             cgrp = cgroup_get_from_fd(attr->memcg_fd);
+             if (IS_ERR(cgrp))
+                     return -EINVAL;
+
+             objcg = get_obj_cgroup_from_cgroup(cgrp);
+             if (IS_ERR(objcg))
+                     return PTR_ERR(objcg);
+     } else {
+             /* Currently if a map is created by a process belonging to the root
+              * memory cgroup, get_obj_cgroup_from_current() will return NULL.
+              * So we have to check map->objcg for being NULL each time it's
+              * being used.
+              */
+             objcg = get_obj_cgroup_from_current();
+     }
+
+     map->objcg = objcg;
+     return 0;
 }

 static void bpf_map_release_memcg(struct bpf_map *map)
@@ -485,8 +501,9 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 }

 #else
-static void bpf_map_save_memcg(struct bpf_map *map)
+static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
 {
+     return 0;
 }

 static void bpf_map_release_memcg(struct bpf_map *map)
@@ -530,13 +547,18 @@ void *bpf_map_container_alloc(union bpf_attr *attr, u64 size, int numa_node)
High level uapi struct should not be passed into low level helper like this.
Pls pass memcg_fd instead.
Sure, I will do it.

-- 
Regards
Yafang
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help