Thread (15 messages) 15 messages, 3 authors, 2012-03-19

Re: [PATCH 4/4] writeback: Avoid iput() from flusher thread

From: Jan Kara <jack@suse.cz>
Date: 2012-03-19 10:46:59
Also in: linux-fsdevel, lkml

On Mon 19-03-12 04:55:15, Christoph Hellwig wrote:
On Fri, Mar 09, 2012 at 10:02:28AM +0100, Jan Kara wrote:
quoted
Doing iput() from flusher thread (writeback_sb_inodes()) can create problems
because iput() can do a lot of work - for example truncate the inode if it's
the last iput on unlinked file. Some filesystems (e.g. ubifs) may need to
allocate blocks during truncate (due to their COW nature) and in some cases
they thus need to flush dirty data from truncate to reduce uncertainty in the
amount of free space. This effectively creates a deadlock.

We get rid of iput() in flusher thread by using the fact that I_SYNC inode
flag effectively pins the inode in memory. So if we take care to either hold
i_lock or have I_SYNC set, we can get away without taking inode reference
in writeback_sb_inodes().

As a side effect, we also fix possible use-after-free in wb_writeback() because
inode_wait_for_writeback() call could try to reacquire i_lock on the inode that
was already free.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c         |   38 ++++++++++++++++++++++++--------------
 fs/inode.c                |   11 ++++++++++-
 include/linux/fs.h        |    7 ++++---
 include/linux/writeback.h |    7 +------
 4 files changed, 39 insertions(+), 24 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1e8bf44..f9f9b61 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -325,19 +325,21 @@ static int write_inode(struct inode *inode, struct writeback_control *wbc)
 }
 
 /*
- * Wait for writeback on an inode to complete.
+ * Wait for writeback on an inode to complete. Called with i_lock held.
+ * Return 1 if we dropped i_lock and waited, 0 is returned otherwise.
  */
-static void inode_wait_for_writeback(struct inode *inode)
+int __must_check inode_wait_for_writeback(struct inode *inode)
 {
 	DEFINE_WAIT_BIT(wq, &inode->i_state, __I_SYNC);
 	wait_queue_head_t *wqh;
 
 	wqh = bit_waitqueue(&inode->i_state, __I_SYNC);
+	if (inode->i_state & I_SYNC) {
 		spin_unlock(&inode->i_lock);
 		__wait_on_bit(wqh, &wq, inode_wait, TASK_UNINTERRUPTIBLE);
+		return 1;
 	}
+	return 0;
This is a horribly ugl primitive.

I'd rather add a

void inode_wait_for_writeback(struct inode *inode)
{
 	DEFINE_WAIT_BIT(wq, &inode->i_state, __I_SYNC);
 	wait_queue_head_t *wqh = bit_waitqueue(&inode->i_state, __I_SYNC);

	__wait_on_bit(wqh, &wq, inode_wait, TASK_UNINTERRUPTIBLE);
}

and opencode all the locking ad I_SYNC checking logic in the callers.
  I agree the primitive is ugly. And actually it is buggy the way I wrote
it. It should have been:
  __wait_on_bit(wqh, &wq, isync_wait, TASK_UNINTERRUPTIBLE);

where isync_wait is:

int isync_wait(void *word)
{
	struct inode *inode = container_of(word, struct inode, i_state);

	spin_unlock(&inode->i_lock);
	schedule();
	return 1;
}

  The problem is i_lock pins the inode for us in some cases. So once we
drop i_lock, inode can go away so we cannot test the bit anymore.

But there are just two places where we really need this. So maybe I can
just opencode it there and for others use normal obvious variant.

								Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help