Re: [PATCH bpf-next 1/2] bpf: try to avoid kzalloc in cgroup/{s,g}etsockopt
From: Song Liu <song@kernel.org>
Date: 2020-12-21 22:23:36
Also in:
bpf
On Thu, Dec 17, 2020 at 9:24 AM Stanislav Fomichev [off-list ref] wrote:
When we attach a bpf program to cgroup/getsockopt any other getsockopt()
syscall starts incurring kzalloc/kfree cost. While, in general, it's
not an issue, sometimes it is, like in the case of TCP_ZEROCOPY_RECEIVE.
TCP_ZEROCOPY_RECEIVE (ab)uses getsockopt system call to implement
fastpath for incoming TCP, we don't want to have extra allocations in
there.
Let add a small buffer on the stack and use it for small (majority)
{s,g}etsockopt values. I've started with 128 bytes to cover
the options we care about (TCP_ZEROCOPY_RECEIVE which is 32 bytes
currently, with some planned extension to 64 + some headroom
for the future).I don't really know the rule of thumb, but 128 bytes on stack feels too big to me. I would like to hear others' opinions on this. Can we solve the problem with some other mechanisms, e.g. a mempool? [...]
quoted hunk ↗ jump to hunk
+static void *sockopt_export_buf(struct bpf_sockopt_kern *ctx) +{ + void *p; + + if (ctx->optval != ctx->buf) + return ctx->optval; + + /* We've used bpf_sockopt_kern->buf as an intermediary storage, + * but the BPF program indicates that we need to pass this + * data to the kernel setsockopt handler. No way to export + * on-stack buf, have to allocate a new buffer. The caller + * is responsible for the kfree(). + */ + p = kzalloc(ctx->optlen, GFP_USER); + if (!p) + return ERR_PTR(-ENOMEM); + memcpy(p, ctx->optval, ctx->optlen); + return p; +} + int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, int *optname, char __user *optval, int *optlen, char **kernel_optval)@@ -1389,8 +1420,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, * use original userspace data. */ if (ctx.optlen != 0) { - *optlen = ctx.optlen; - *kernel_optval = ctx.optval; + void *buf = sockopt_export_buf(&ctx);
I found it is hard to follow the logic here (when to allocate memory, how to fail over, etc.). Do we have plan to reuse sockopt_export_buf()? If not, it is probably cleaner to put the logic in __cgroup_bpf_run_filter_setsockopt()? Thanks, Song [...]