Re: [PATCH net-next 2/2] flow_dissector: Add limits for encapsulation and EH

From: Tom Herbert <hidden>
Date: 2017-09-01 16:50:00

On Fri, Sep 1, 2017 at 9:35 AM, Hannes Frederic Sowa
[off-list ref] wrote:

Hello Tom,

Tom Herbert [off-list ref] writes:

quoted

On Fri, Sep 1, 2017 at 6:32 AM, Hannes Frederic Sowa
[off-list ref] wrote:

quoted

Tom Herbert [off-list ref] writes:

quoted

In flow dissector there are no limits to the number of nested
encapsulations that might be dissected which makes for a nice DOS
attack. This patch limits for dissecting nested encapsulations
as well as for dissecting over extension headers.

I was actually more referring to your patch, because the flow dissector
right now is not stack recursive. Your changes would make it doing
recursion on the stack.

I don't believe those patches had any recursion.

I was wrong with stack recursion, you handle that using the
FLOW_DISSECT_RET_PROTO_AGAIN return value thus leaving the stack frame
again, sorry.

But otherwise the walk would be unlimited (based on the packet size) in
your first patchset, correct? See this malicious example:

| IP1 | UDP1 | VXLAN1 | Ethernet | IP2 | UDP2 | VXLAN2 | ...

Without the limits patch I subsequently proposed, yes. However, this
is true for all the other encapsulations anyway; there's is nothing
unique about UDP encapsulations in this regard (hence with the limit
patch should generally apply to all encapsulations).

where IP1 == IP2, UDP1 == UDP2 and VXLAN1 != VXLAN2?

Notice that because IP1 == IP2 and UDP1 == UDP2 it seems to me it would
hit the same socket again. We would be prone to overwrite vxlan id 1
with vxlan id 2 in the key thus the key would be malicious and traffic
could be injected into other tenant networks, if the encapsulated
packets within VXLAN1 could be generated by a malicious user?

This is why flow dissection is not an authoritative parsing of the
packet. It can be wrong or misleading because it doesn't have all the
context, doesn't necessarily parse the whole chain, and only returns
one set of information (for instance only one pair of IP addresses
when there may be more in a packet). It's just a best effort mechanism
that is great for computing a hash for instance. If someone is
steering a packet to a VM based on the output of flow dissector that
is a bug; the only correct way to do this is to go through the normal
receive protocol processing path.

I was actually not concerned about the "recursion" but merely about
updating the values to the innermost values.

See my previous comment about use STOP_AT_ENCAP.

Tom

quoted

Reported-by: Hannes Frederic Sowa <redacted>
Signed-off-by: Tom Herbert <redacted>
---
 net/core/flow_dissector.c | 48 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 5110180a3e96..1bca748de27d 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c

@@ -396,6 +396,35 @@ __skb_flow_dissect_ipv6(const struct sk_buff *skb,
      key_ip->ttl = iph->hop_limit;
 }

+/* Maximum number of nested encapsulations that can be processed in
+ * __skb_flow_dissect
+ */
+#define MAX_FLOW_DISSECT_ENCAPS      5

I think you can exactly parse one encapsulation layer safely. This is
because you can only keep state on one encapsulation layer protocol
right now.

For example this scenario:

I would like to circumvent tc flower rules that filter base
simply construct a packet looking like:

Ethernet|Vlan|IP|GRE|Ethernet|Vlan|

because we don't recurse in the flow keys either, the second vlan header
would overwrite the information of the first one.

Flow dissector returns the inner most information it sees. If only
information from the outermost headers is needed then the caller
should set FLOW_DISSECTOR_F_STOP_AT_ENCAP in the flags.

Right now, flower does not do so and seems to be prone to be fooled like
that.

Additionally, flow_dissector does not update all keys to its inner most
values also, e.g. I don't see the inner packets updating
FLOW_DISSECTOR_KEY_ETH_ADDRS (e.g. with ETH_P_TEB).

To me it would make more sense to be able to specificy
FLOW_DISSECTOR_F_AFTER_FIRST_ENCAP, so that we get the first
encapsulation information with the key, do policy decision, pull headers
and recurse here.

quoted

+
+static bool skb_flow_dissect_encap_allowed(int *num_encaps, unsigned int *flags)
+{
+     ++*num_encaps;
+
+     if (*num_encaps >= MAX_FLOW_DISSECT_ENCAPS) {
+             if (*num_encaps == MAX_FLOW_DISSECT_ENCAPS) {
+                     /* Allow one more pass but ignore disregard
+                      * further encapsulations
+                      */
+                     *flags |= FLOW_DISSECTOR_F_STOP_AT_ENCAP;
+             } else {
+                     /* Max encaps reached */
+                     return  false;
+             }
+     }
+
+     return true;
+}
+
+/* Maximum number of extension headers can be processed in __skb_flow_dissect
+ * per IPv6 packet
+ */
+#define MAX_FLOW_DISSECT_EH  5

I would at least allow each extension header once, DEST_OPS twice, thus
let's say 12? It is safe in this version because it does not consume
stack space anyway and doesn't update internal state.

This per each IPv6 header. This patch would allow up to five
encapsulations of IPv6/IPv6 so maximum number of EHs we would consider
is 6*5=30.

I would still increase the limit, but also no hard feelings. The reason
I came up with 12 is that anyway each header (besides DST_OPS) is only
allowed to show up once per header (11 + 1).

I am fine that this is a per-IPv6 header limit.

quoted

+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are specified

@@ -426,6 +455,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
      struct flow_dissector_key_tags *key_tags;
      struct flow_dissector_key_vlan *key_vlan;
      enum flow_dissect_ret fdret;
+     int num_eh, num_encaps = 0;
      bool skip_vlan = false;
      u8 ip_proto = 0;
      bool ret;
@@ -714,7 +744,9 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
      case FLOW_DISSECT_RET_OUT_GOOD:
              goto out_good;
      case FLOW_DISSECT_RET_PROTO_AGAIN:
-             goto proto_again;
+             if (skb_flow_dissect_encap_allowed(&num_encaps, &flags))
+                     goto proto_again;

I think you should get the check to the proto_again label. In case you
loop to often you can `goto out_good'.

That would add an extra conditional in the common case of no
encapsulation. Having it here means we only care about the
encapsulation limit when there is encapsulation.

In my opinion it would make the limits easier to grasp, but no hard
feelings about that.

Thanks,
Hannes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help