Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Oleg V. Ukhno <hidden>
Date: 2011-01-18 15:28:49
On 01/18/2011 05:54 PM, Nicolas de Pesloüan wrote:
Le 18/01/2011 13:40, Oleg V. Ukhno a écrit : The fact that there exist many situations where it simply doesn't work, should not cause the idea of Oleg to be rejected. In Documentation/networking/bonding.txt, tuning tcp_reordering on receiving side is already documented as a possible workaround for out of order delivery due to load balancing of a single TCP session, using mode=balance-rr. This might work reasonably well in a pure LAN topology, without any router between both ends of the TCP session, even if this is limited to Linux hosts. The uses are not uncommon and not limited to iSCSI: - between an application server and a database server, - between members of a cluster, for replication purpose, - between a server and a backup system, - ...
Nicolas, thank you for your opinion - this is exactly what I mean - iSCSI is just one particular use case, but there are many cases where this load balancing method will be useful
Of course, for longer paths, with routers and variable RTT, we would need something different (possibly MultiPathTCP: http://datatracker.ietf.org/wg/mptcp/). I remember a topology (described by Jay, for as far as I remember), where two hosts were connected through two distinct VLANs. In such topology: - it is possible to detect path failure using arp monitoring instead of miimon. - changing the destination MAC address of egress packets are not necessary, because egress path selection force ingress path selection due to the VLAN.
In case with two VLANs - yes, this shouldn't be necessary(but needs to be tested, I am not sure), but within one - it is essential for correct rx load striping.
I think the only point is whether we need a new xmit_hash_policy for mode=802.3ad or whether mode=balance-rr could be enough.
May by, but it seems to me fair enough not to restrict this feature only to non-LACP aggregate links; dynamic aggregation may be useful(it helps to avoid switch misconfiguration(misconfigured slaves on switch side) sometimes without loss of service).
Oleg, would you mind trying the above "two VLAN" topology" with mode=balance-rr and report any results ? For high-availability purpose, it's obviously necessary to setup those VLAN on distinct switches.
I'll do it, but it will take some time to setup test environment,
several days may be.
You mean following topology:
switch 1
/ \
host A host B
\ switch 2 /
(i'm sure it will work as desired if each host is connected to each
switch with only one slave link, if there are more slaves in each switch
- unsure)?Nicolas
-- Best regards, Oleg Ukhno. ITO Team Lead, Yandex LLC.