Re: [PATCH net -v2] [BUGFIX] bonding: use flush_delayed_work_sync in bond_close
From: Stephen Hemminger <hidden>
Date: 2011-10-19 18:41:35
Also in:
netdev
On Wed, 19 Oct 2011 11:01:02 -0700 Jay Vosburgh [off-list ref] wrote:
Mitsuo Hayasaka [off-list ref] wrote:quoted
The bond_close() calls cancel_delayed_work() to cancel delayed works. It, however, cannot cancel works that were already queued in workqueue. The bond_open() initializes work->data, and proccess_one_work() refers get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when work->data has been initialized. Thus, a panic occurs. This patch uses flush_delayed_work_sync() instead of cancel_delayed_work() in bond_close(). It cancels delayed timer and waits for work to finish execution. So, it can avoid the null pointer dereference due to the parallel executions of proccess_one_work() and initializing proccess of bond_open().I'm setting up to test this. I have a dim recollection that we tried this some years ago, and there was a different deadlock that manifested through the flush path. Perhaps changes since then have removed that problem. -J
Won't this deadlock on RTNL. The problem is that:
CPU0 CPU1
rtnl_lock
bond_close
delayed_work
mii_work
read_lock(bond->lock);
read_unlock(bond->lock);
rtnl_lock... waiting for CPU0
flush_delayed_work_sync
waiting for delayed_work to finish...