Thread (16 messages) 16 messages, 3 authors, 2021-01-04

Re: [PATCH bpf-next 1/2] bpf: try to avoid kzalloc in cgroup/{s,g}etsockopt

From: Song Liu <song@kernel.org>
Date: 2020-12-21 22:23:36
Also in: bpf

On Thu, Dec 17, 2020 at 9:24 AM Stanislav Fomichev [off-list ref] wrote:
When we attach a bpf program to cgroup/getsockopt any other getsockopt()
syscall starts incurring kzalloc/kfree cost. While, in general, it's
not an issue, sometimes it is, like in the case of TCP_ZEROCOPY_RECEIVE.
TCP_ZEROCOPY_RECEIVE (ab)uses getsockopt system call to implement
fastpath for incoming TCP, we don't want to have extra allocations in
there.

Let add a small buffer on the stack and use it for small (majority)
{s,g}etsockopt values. I've started with 128 bytes to cover
the options we care about (TCP_ZEROCOPY_RECEIVE which is 32 bytes
currently, with some planned extension to 64 + some headroom
for the future).
I don't really know the rule of thumb, but 128 bytes on stack feels too big to
me. I would like to hear others' opinions on this. Can we solve the problem
with some other mechanisms, e.g. a mempool?

[...]
quoted hunk ↗ jump to hunk
+static void *sockopt_export_buf(struct bpf_sockopt_kern *ctx)
+{
+       void *p;
+
+       if (ctx->optval != ctx->buf)
+               return ctx->optval;
+
+       /* We've used bpf_sockopt_kern->buf as an intermediary storage,
+        * but the BPF program indicates that we need to pass this
+        * data to the kernel setsockopt handler. No way to export
+        * on-stack buf, have to allocate a new buffer. The caller
+        * is responsible for the kfree().
+        */
+       p = kzalloc(ctx->optlen, GFP_USER);
+       if (!p)
+               return ERR_PTR(-ENOMEM);
+       memcpy(p, ctx->optval, ctx->optlen);
+       return p;
+}
+
 int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
                                       int *optname, char __user *optval,
                                       int *optlen, char **kernel_optval)
@@ -1389,8 +1420,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
                 * use original userspace data.
                 */
                if (ctx.optlen != 0) {
-                       *optlen = ctx.optlen;
-                       *kernel_optval = ctx.optval;
+                       void *buf = sockopt_export_buf(&ctx);
I found it is hard to follow the logic here (when to allocate memory, how to
fail over, etc.). Do we have plan to reuse sockopt_export_buf()? If not, it is
probably cleaner to put the logic in __cgroup_bpf_run_filter_setsockopt()?

Thanks,
Song

[...]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help