Re: [PATCH v3 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF

From: Chema Gonzalez <hidden>
Date: 2014-05-16 18:41:28

On Wed, May 14, 2014 at 3:44 PM, Alexei Starovoitov [off-list ref] wrote:

please try 'modprobe test_bpf'… you'll see most ld_proto, ld_vlan, …
tests failing..

Fixed.

quoted

sk_run_filter() is a generic interpreter that should be suitable for
sockets, seccomp, tracing, etc. Adding one use case specific data
structure to interpreter is not desirable.

That's an interesting point, which you'd agree does apply to "ld poff"
right now. There's a tradeoff here between making the BPF VM as
generic as possible, and having to reimplement in BPF VM code
functions that already exist in the kernel (the flow dissector
itself). I think not having 2 flow dissectors in the kernel (one in C,
one in BPF VM code) is a win.

You misunderstood. That's not at all what I said. See the diff below.

OK. Fixed using the extended eBPF stack (EES). Note that I added an
explicit struct to put stuff in the EES (instead of assuming what's
there).

quoted

Doing memset(&flow,0...) every invocation is not performance
friendly either. Majority of filters will suffer.

Will fix. In retrospect, I should have just added an flow_keys_is_init
field instead of mapping {0, 0, {0}, 0, 0} to "not inited."

Done.

quoted

You can do the same without touching sk_run_filter().
For example, you can tweak sk_convert_filter() to pass
R10(frame pointer) as arg4 to helper functions
(__skb_get_pay_offset() and others)
then inside those functions cast 'arg4 - appropriate offset'
to struct flow_keys and use it.
BPF was extended from 16 spill/fill only slots to generic
stack space exactly for this type of use cases.

I tried several implementations, including the one you propose:

nope. see the diff of what I'm proposing.

Done.

quoted hunk ↗ jump to hunk

quoted

1. add flow_keys to skb CB (packet_skb_cb)
The problem here is that struct flow_keys (15 bytes) does not fit in
AF_PACKET's 48-byte skb CB. I also tried to create a small_flow_keys
struct that only contains L3 info (nhoff/proto, which is the minimum
output you need from the flow dissector to be able to get the rest
without having to rewalk the packet), or maybe L3 and L4
(thoff/ip_proto). The problem is that the packet CB is completely full
already. packet_skb_cb is 4 (origlen) + 20 (sockaddr_ll) and
MAX_ADDR_LEN=32, so the net/packet/af_packet.c:1822 check is 4+20+32-8
= 48, which is already the size of the sk_buff->CB.

2. add flow_keys using the stack in the 3 BPF filter runners
(original, eBPF, JIT)
  - (-) there are 6x BPF engines (1x non-JIT and 5x JIT)
  - (+) we can just do the non-JIT version, and let JITs implement it
when they want)
  - (+) very simple

3. add FLOW, BPF_REG_FLOW, map it to a register, and then pass it as
R4. This is what you're proposing, IIIC. I found the implementation
way more complicated (in fact I gave up trying to make it not crash
;), but I guess this is my own taste. What really convinced me to
extend the context is that, if you think about it, both the skb and
the flow_keys are part of the context that the bpf_calls use to run.
If we pass flow_keys in R4, and we ever want to add more context
parameters to a call, that means using R5. And then what?

quoted

Key point: you can add pretty much any feature to classic BPF
by tweaking converter from classic to internal without changing
interpreter and causing trainwreck for non-socket use cases.

Again, the patch is not breaking the current functionality. We already
provide "ld poff" as a recognition that packet processing is a first
class citizen of the BPF VM. If you run a seccomp filter containing a
"ld poff" right now, you're probably causing issues anyway.

'ld poff' is not first class citizen of interpreter. There is no such eBPF insn.
It's a classic BPF instruction. It is converted into a sequence of eBPF
insns including bpf_call.

Just to reduce the number of back and forth emails here is the diff
of what you want to do that doesn't break things:
(sorry for whitespace mangling)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7a9beeb1c458..10f580e44a33 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h

@@ -3065,7 +3065,7 @@ bool skb_partial_csum_set(struct sk_buff *skb,

u16 start, u16 off);

 int skb_checksum_setup(struct sk_buff *skb, bool recalculate);

-u32 __skb_get_poff(const struct sk_buff *skb);
+u32 __skb_get_poff(const struct sk_buff *skb, struct flow_keys *keys);

 /**
  * skb_head_is_locked - Determine if the skb->head is locked down

diff --git a/net/core/filter.c b/net/core/filter.c
index 32c5b44c537e..1179f2fb4c95 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c

@@ -602,7 +602,12 @@ static unsigned int pkt_type_offset(void)

 static u64 __skb_get_pay_offset(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
 {
-       return __skb_get_poff((struct sk_buff *)(unsigned long) ctx);
+       /* top BPF_MEMWORDS * 4 bytes are used to represent classic BPF
+        * mem[0-15] slots, use next sizeof(struct flow_keys) bytes
+        * of stack to share flow_dissect-ed data
+        */
+       struct flow_keys *keys = (void *) r4 - BPF_MEMWORDS * 4 - sizeof(*keys);
+       return __skb_get_poff((struct sk_buff *)(unsigned long) ctx, keys);
 }

 static u64 __skb_get_nlattr(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)

@@ -783,6 +788,10 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
                *insn = BPF_ALU64_REG(BPF_MOV, BPF_REG_ARG3, BPF_REG_X);
                insn++;

+               /* arg4 = FP */
+               *insn = BPF_ALU64_REG(BPF_MOV, BPF_REG_ARG4, BPF_REG_FP);
+               insn++;
+
                /* Emit call(ctx, arg2=A, arg3=X) */
                insn->code = BPF_JMP | BPF_CALL;
                switch (fp->k) {

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 107ed12a5323..c8d9f16e4872 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c

@@ -275,16 +275,15 @@ EXPORT_SYMBOL(__skb_tx_hash);
  * truncate packets without needing to push actual payload to the user
  * space and can analyze headers only, instead.
  */
-u32 __skb_get_poff(const struct sk_buff *skb)
+u32 __skb_get_poff(const struct sk_buff *skb, struct flow_keys *keys)
 {
-       struct flow_keys keys;
        u32 poff = 0;

-       if (!skb_flow_dissect(skb, &keys))
+       if (!skb_flow_dissect(skb, keys))
                return 0;

-       poff += keys.thoff;
-       switch (keys.ip_proto) {
+       poff += keys->thoff;
+       switch (keys->ip_proto) {
        case IPPROTO_TCP: {

now just use the same 'keys' address in your new get_off calls.
First word can be used to indicate whether flow_dissector() was called
or not.

Above is tested with and without JIT with BPF testsuite.

Patches now.
-Chema

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help