Thread (32 messages) 32 messages, 5 authors, 2011-02-03

Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing

From: Oleg V. Ukhno <hidden>
Date: 2011-01-18 15:28:49

On 01/18/2011 05:54 PM, Nicolas de Pesloüan wrote:
Le 18/01/2011 13:40, Oleg V. Ukhno a écrit :

The fact that there exist many situations where it simply doesn't work,
should not cause the idea of Oleg to be rejected.

In Documentation/networking/bonding.txt, tuning tcp_reordering on
receiving side is already documented as a possible workaround for out of
order delivery due to load balancing of a single TCP session, using
mode=balance-rr.

This might work reasonably well in a pure LAN topology, without any
router between both ends of the TCP session, even if this is limited to
Linux hosts. The uses are not uncommon and not limited to iSCSI:
- between an application server and a database server,
- between members of a cluster, for replication purpose,
- between a server and a backup system,
- ...
Nicolas, thank you for your opinion - this is exactly what I mean - 
iSCSI is just one particular use case, but there are many cases where 
this load balancing method will be useful
Of course, for longer paths, with routers and variable RTT, we would
need something different (possibly MultiPathTCP:
http://datatracker.ietf.org/wg/mptcp/).

I remember a topology (described by Jay, for as far as I remember),
where two hosts were connected through two distinct VLANs. In such
topology:
- it is possible to detect path failure using arp monitoring instead of
miimon.
- changing the destination MAC address of egress packets are not
necessary, because egress path selection force ingress path selection
due to the VLAN.
In case with two VLANs - yes, this shouldn't be necessary(but needs to 
be tested, I am not sure), but within one - it is essential for correct 
rx load striping.
I think the only point is whether we need a new xmit_hash_policy for
mode=802.3ad or whether mode=balance-rr could be enough.
May by, but it seems to me fair enough not to restrict this feature only 
to non-LACP aggregate links; dynamic aggregation may be useful(it helps 
to avoid switch misconfiguration(misconfigured slaves on switch side) 
sometimes without loss of service).
Oleg, would you mind trying the above "two VLAN" topology" with
mode=balance-rr and report any results ? For high-availability purpose,
it's obviously necessary to setup those VLAN on distinct switches.
I'll do it, but it will take some time to setup test environment, 
several days may be.
You mean following topology:
           switch 1
        /           \
host A                host B
        \  switch 2 /

(i'm sure it will work as desired if each host is connected to each 
switch with only one slave link, if there are more slaves in each switch 
- unsure)?
Nicolas


-- 
Best regards,
Oleg Ukhno.
ITO Team Lead,
Yandex LLC.


Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help