Re: [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list()
From: Tejun Heo <tj@kernel.org>
Date: 2017-02-12 04:40:27
Hello, Jan. On Thu, Feb 09, 2017 at 01:44:31PM +0100, Jan Kara wrote:
When block device is closed, we call inode_detach_wb() in __blkdev_put() which sets inode->i_wb to NULL. That is contrary to expectations that inode->i_wb stays valid once set during the whole inode's lifetime and leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because inode_to_wb() returned NULL. The reason why we called inode_detach_wb() is not valid anymore though. BDI is guaranteed to stay along until we call bdi_put() from bdev_evict_inode() so we can postpone calling inode_detach_wb() to that moment. A complication is that i_wb can point to non-root wb_writeback structure and in that case we do need to clean it up as bdi_unregister() blocks waiting for all non-root wb_writeback references to get dropped. Thus this i_wb reference could block device removal e.g. from __scsi_remove_device() (which indirectly ends up calling bdi_unregister()). We cannot rely on block device inode to go away soon (and thus i_wb reference to get dropped) as the device may got hot-removed e.g. under a mounted filesystem. We deal with these issues by switching block device inode from non-root wb_writeback structure to bdi->wb when needed. Since this is rather expensive (requires synchronize_rcu()) we do the switching only in del_gendisk() when we know the device is going away.
So, the only reason cgwb_bdi_destroy() is synchronous is because bdi destruction was synchronous. Now that bdi is properly reference counted and can be decoupled from gendisk / q destruction, I can't think of a reason to keep cgwb destruction synchronous. Switching wb's on destruction is kinda clumsy and it almost always hurts to expose synchronize_rcu() in userland visible paths. Wouldn't something like the following work? * Remove bdi->usage_cnt and the synchronous waiting in cgwb_bdi_destroy(). * Instead, make cgwb's hold bdi->refcnt and put it from cgwb_release_workfn(). Then, we don't have to switch during shutdown and can just let things drain. Thanks. -- tejun