Thread (6 messages) 6 messages, 3 authors, 2011-12-30

RE: bonding device in balance-alb mode shows packet loss in kernel 3.2-rc6

From: <hidden>
Date: 2011-12-30 12:22:14
Subsystem: bonding driver, networking drivers, the rest · Maintainers: Jay Vosburgh, Andrew Lunn, "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds

-----Original Message-----
From: Jay Vosburgh [mailto:fubar@us.ibm.com]
Sent: Thursday, December 29, 2011 1:39 AM
To: K, Narendra
Cc: netdev@vger.kernel.org
Subject: Re: bonding device in balance-alb mode shows packet loss in kernel
3.2-rc6

[off-list ref] wrote:
quoted
By observing the packets on remote HOST2, the sequence is

1. 'bond0' broadcasts an ARP request with source MAC equal to 'bond0'
MAC address and receives a ARP response to the same.
Next few packets are received.
	In this case, it means the peer has been assigned to the "em2"
slave.
quoted
2. After some, there are 2 ARP replies from 'bond0' to HOST2 with
source MAC equal to 'inactive slave' MAC id. Now HOST2 sends ICMP
response with destnation MAC equal to inactive slave MAC id and these
packets are dropped.
	This part is not unusual for the balance-alb mode; the traffic is
periodically rebalanced, and in this case the peer HOST2 was likely assigned
to a different slave that it was previously.  I'm not sure why the packets don't
reach their destination, but they shouldn't be dropped due to the slave being
"inactive," as I explained above.
quoted
The wireshark protocol trace is attached to this note.

3. The behavior was independent of the Network adapters models.

4. Also, I had few prints in 'eth_type_trans' and it seemed like the 'inactive
slave'
quoted
was not receiving any frames destined to it (00:21:9b:9d:a5:74) except ARP
broadcasts.
quoted
Setting the 'inactive slave' in 'promisc' mode made bond0 see the responses.
	This seems very strange, since the MAC information shown later
suggests that the slaves all are using their original MAC addresses, so the
packets ought to be delivered.

	I'm out of the office until next week, so I won't have an opportunity
to try and reproduce this myself until then.  I wonder if something in the
rx_handler changes over the last few months has broken this, although a
look at the code suggests that it should be doing the right things.
Hi Jay, thanks for looking into this. I am out of office next week.
I am copying Surya if additional information is required.
(Please keep Surya in CC).

It was strange that 'eth_type_trans' showed only ARP broadcasts for
em3 and em4. Interestingly when i set the perm HW address of em3 manually
by

ifconfig em3 hw ether 00:21:9b:9d:a5:74

packet drops stopped and 'eth_type_trans' showed unicast frames
destined to 00:21:9b:9d:a5:74.

I put few debug prints in 'bnx2_set_mac_addr' to see what MAC ids are
getting set in the hardware. When i stopped and started the bond0,
all the slaves seemed to have the same MAC id 
(of em2 and bond0 00:21:9b:9d:a5:72). 

Also, the following change made the packet drops stop and prints in
'bnx2_set_mac_addr' seemed to indicate that all slaves got unique
mac id set in hardware.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7f87568..e717267 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1620,7 +1620,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
         */
        memcpy(new_slave->perm_hwaddr, slave_dev->dev_addr, ETH_ALEN);

-       if (!bond->params.fail_over_mac) {
+       if (!bond->params.fail_over_mac && !bond_is_lb(bond)) {
                /*
                 * Set slave to master's mac address.  The application already
                 * set the master's mac address to that of the first slave

With regards,
Narendra K
 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help