Re: [PATCH V4 net 0/2] ipv6: fix flowlabel issue for reset packet
From: Martin KaFai Lau <hidden>
Date: 2017-11-15 18:27:44
On Tue, Nov 14, 2017 at 01:59:03PM -0800, Shaohua Li wrote:
On Tue, Nov 14, 2017 at 11:13:10AM -0800, Tom Herbert wrote:quoted
On Tue, Nov 14, 2017 at 10:24 AM, Shaohua Li [off-list ref] wrote:quoted
On Wed, Nov 08, 2017 at 09:44:51AM -0800, Tom Herbert wrote:quoted
On Fri, Aug 18, 2017 at 3:27 PM, David Miller [off-list ref] wrote:quoted
From: Martin KaFai Lau <redacted> Date: Fri, 18 Aug 2017 13:51:36 -0700quoted
It seems like that middle box specifically drops TCP_RST if it does not know anything about this flow. Since the flowlabel of the TCP_RST (sent in TW state) is always different, it always lands to a different middle box. All of these TCP_RST cannot be delivered.This really is illegal behavior. The flow label is not a flow _KEY_ by any definition whatsoever. Flow labels are an optimization, not a determinant for flow matching particularly for proper TCP state processing. I'd rather you invest all of this energy getting that vendor to fix their kit.We're now seeing several router vendors recommending people to not use flow labels for ECMP hashing. This is precisely because when a flow label changes, network devices that maintain state (firewalls, NAT, load balancers) can't deal with packets being rerouted so connections are dropped. Unfortunately, the need for packets of a flow to always follow the same path has become an implicit requirement that I think we need follow at least as the default behavior. Martin: is there any change you could resurrect these patches? In order to solve the general problem of making routing consistent, I believe we want to keep sk_tx_hash consistent for the connection from which a consistent flow label can be derived. To avoid the overhead of a hash field in sk_common, maybe we could initially set a connection hash to a five-tuple hash for a flow instead of a random value? So in TW state the consistent hash can be computed on the fly.Hi Tom, Do we really need to use the five-tupe hash? There are several places using current random hash, which looks more lightweight. To fix issue, we only need to make sure reset packet include the correct flowlabel. Like what my previous patch did, we can set tw->tw_flowlabel in tcp_time_wait based on txhash and use it reset packet. In this way we can use the random hash and not add extra field in sock.Shaohua, But that patch discards the full txhash in TW. So it's not just a problem with the flow label. sk_tx_hash can also be used for route selection in ECMP, port selection we're doing tunneling, etc. The general solution should maintains tx_hash or be able to reconstruct it in any state, flow label fix is a point solution.Hi Tom, do you want to keep sk_rethink_txhash() then? If we changed the hash to random number, we can't reconstruct it for sure.
A followup question on rethink. Does it mean we need a new sysctl (persistent_txhash) to avoid sk_rethink_txhash() together such that it keeps the routing decision consistent (e.g. flowlabel) ?
Thanks, Shaohua