Thread (13 messages) 13 messages, 5 authors, 2011-10-30

Re: [PATCH net -v2] [BUGFIX] bonding: use flush_delayed_work_sync in bond_close

From: Stephen Hemminger <hidden>
Date: 2011-10-19 18:41:35
Also in: lkml

On Wed, 19 Oct 2011 11:01:02 -0700
Jay Vosburgh [off-list ref] wrote:
Mitsuo Hayasaka [off-list ref] wrote:
quoted
The bond_close() calls cancel_delayed_work() to cancel delayed works.
It, however, cannot cancel works that were already queued in workqueue.
The bond_open() initializes work->data, and proccess_one_work() refers
get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
work->data has been initialized. Thus, a panic occurs.

This patch uses flush_delayed_work_sync() instead of cancel_delayed_work()
in bond_close(). It cancels delayed timer and waits for work to finish
execution. So, it can avoid the null pointer dereference due to the
parallel executions of proccess_one_work() and initializing proccess
of bond_open().
	I'm setting up to test this.  I have a dim recollection that we
tried this some years ago, and there was a different deadlock that
manifested through the flush path.  Perhaps changes since then have
removed that problem.

	-J
Won't this deadlock on RTNL.  The problem is that:

   CPU0                            CPU1
  rtnl_lock
      bond_close
                                 delayed_work
                                   mii_work
                                     read_lock(bond->lock);
                                     read_unlock(bond->lock);
                                     rtnl_lock... waiting for CPU0
      flush_delayed_work_sync
          waiting for delayed_work to finish...
              
                                    
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help