Re: [PATCH] bonding: rejoin multicast groups on VLANs
From: Flavio Leitner <hidden>
Date: 2010-09-29 20:38:47
On Wed, Sep 29, 2010 at 03:54:11PM -0400, Andy Gospodarek wrote:
On Wed, Sep 29, 2010 at 04:35:39PM -0300, Flavio Leitner wrote:quoted
On Wed, Sep 29, 2010 at 02:44:13PM -0400, Andy Gospodarek wrote:quoted
On Wed, Sep 29, 2010 at 04:12:24AM -0300, Flavio Leitner wrote:quoted
It fixes bonding to rejoin multicast groups added to VLAN devices on top of bonding when a failover happens. The first packet may be discarded, so the timer assure that at least 3 Reports are sent.Good find, Flavio. Clearly the fact that multicast membership is broken needs to be fixed, but I would rather not see timers used at all. We worked hard in the past to eliminate timers for several reasons, so I would rather see a workqueue used.I noticed that the code is using workqueues now, just thought a simple thing which may run couple times would fit perfectly with a simple timer.Timers runs in softirq context, so I'd rather not add code that takes locks and runs in softirq context.quoted
quoted
I also don't like retransmitting the membership report 3 times when it may not be needed. Though many switches can handle it, the cost of receiving and processing what might be a large list of multicast addresses every 200ms for 600ms doesn't seem ideal. It also feels like a hack. :)Definitely a parameter is much better, but I wasn't sure about the patch approach so I was expecting a review like this and then do the refinements needed. Better to post early, right? :) I see your point to change the default to one membership report, but we can't assure during a failover if everything has been received. Also, it isn't supposed to keep failing flooding the network, so I would rather have couple membership reports being send than watch an important multicast application failing. Perhaps 3 is too much, but one sounds too few to me. what you think?Adding a tunable parameter allows the administrator to decide how many is enough. I would rather keep the default at one and add the tunable parameter (which needs to be added to bond_sysfs.c to be effective). I have not heard loud complaints about only sending one since the code to send retransmits of membership reports was added a few years ago, so I'm inclined to think it is working well for most users (or no one is using bonding). Maybe it would be best to break this into 2 patches. One that simply fixes the failover code so it works with VLANs (that could be done easily today) and another patch that can add the code to send multiple retransmits. Would you be willing to do that?
Sure, I can do it and then start another testing session here. -- Flavio