Re: Possible race with br_del_if()
From: Stephen Hemminger <hidden>
Date: 2005-08-19 19:40:42
On Fri, 19 Aug 2005 14:10:52 -0500 Ryan Harper [off-list ref] wrote:
* Stephen Hemminger [off-list ref] [2005-08-18 17:36]:quoted
On Thu, 18 Aug 2005 17:23:23 -0500 Ryan Harper [off-list ref] wrote:quoted
* Stephen Hemminger [off-list ref] [2005-08-18 17:11]:quoted
On Thu, 18 Aug 2005 16:40:36 -0500 Ryan Harper [off-list ref] wrote:quoted
Hello, I've encountered several oops when adding and removing interfaces from bridges while using Xen. Most of the details are available [1]here. The short of it is the following sequence:Doesn't the mutex in RTNL work right? or are you calling routines with out asserting it?unregister_netdevice asserts RTNL, add_del_if() in br_ioctl.c doesn't seem to do so. I don't see it down dev_get_by_index() path either. It looks like any caller of add_del_if() isn't asserting RTNL. The two callers I see are: br_dev_ioctl() in br_ioctl.c old_dev_ioctl() in br_ioctl.cBut the pat to br_dev_ioctl() is via the socket ioctl and that should already have gotten RTNL. dev_ioctl rtnl_lock() dev_ifsioc() dev->do_ioctl --> br_dev_ioctlJust to follow-up, the issue was a race between the call_rcu() callback for destroy_nbp() and an unregister_netdev() call. Sometimes the br_device_event() routine was triggered and destroy_nbp() had not been run yet leaving dev->br_port non-NULL to which br_device_event then correctly calls br_del_if(). We caused this by issuing a brctl delif from userspace scripts and having a in kernel handler invoke unregister_netdev() call. Our fix is to not bother calling brctl delif because the unregister_netdev() call will automatically remove the device from the bridge when the notify_call_chain() kicks in from unregister_netdevice().
I'll get back to you, this needs some review, I have a bunch of old test suites to dig up for it.