Thread (9 messages) 9 messages, 3 authors, 2012-08-30

Re: [PATCH 1/3] ext4: nonda_switch prevent deadlock

From: Akira Fujita <hidden>
Date: 2012-08-30 11:13:13

Hi,

(2012/08/29 22:28), Jan Kara wrote:
On Tue 28-08-12 20:21:41, Dmitry Monakhov wrote:
quoted
Currently ext4_da_write_begin may deadlock if called with opened journal
transaction. Real life example:
->move_extent_per_page()
   ->ext4_journal_start()-> hold journal transaction
   ->write_begin()
     ->ext4_da_write_begin()
       ->ext4_nonda_switch()
         ->writeback_inodes_sb_if_idle()  --> will wait for journal_stop()

This bug may be easily fixed by code reordering,
But in some cases it should be possible to call write_begin()
while holding journal's transaction, in this case caller may avoid
recoursion by passing AOP_FLAG_NOFS flag.
   Well, I find calling ext4_write_begin() with a transaction started a bug.
Possibly ext4_write_begin() can be tweaked to handle that but things would
be simpler if we didn't have to.

Looking into move_extent_per_page(), calling ->write_begin() doesn't seem
to be quite right there anyway. For example it results in filling holes
under that page which is not desirable. I'm not even sure why do we call
->write_begin() there at all. The data in the page is unchanged. So it
should be enough to just remap buffers and mark the page dirty (but I might
be missing some subtlety here). Fujita-san, can you possibly explain?
Originally, calling write_begin/end in move_extent_per_page() was
to get a page and mark bh which exchanged by mext_replace_branches() as dirty.
But if there is a better way to do this, it makes sense to fix.

Regards,
Akira Fujita
								Honza
quoted
---
  fs/ext4/inode.c |   28 +++++++++++++++++-----------
  1 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6324f74..d12d30e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -889,6 +889,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
  	struct page *page;
  	pgoff_t index;
  	unsigned from, to;
+	int nofs = flags & AOP_FLAG_NOFS;
+
+	/* We cannot recurse into the filesystem if the transaction is already
+	 * started */
+	BUG_ON(!nofs && journal_current_handle());
  
  	trace_ext4_write_begin(inode, pos, len, flags);
  	/*
@@ -906,9 +911,6 @@ retry:
  		ret = PTR_ERR(handle);
  		goto out;
  	}
-
-	/* We cannot recurse into the filesystem as the transaction is already
-	 * started */
  	flags |= AOP_FLAG_NOFS;
  
  	page = grab_cache_page_write_begin(mapping, index, flags);
@@ -957,7 +959,8 @@ retry:
  		}
  	}
  
-	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
+	if (!nofs && ret == -ENOSPC &&
+	    ext4_should_retry_alloc(inode->i_sb, &retries))
  		goto retry;
  out:
  	return ret;
@@ -2447,7 +2450,7 @@ out_writepages:
  }
  
  #define FALL_BACK_TO_NONDELALLOC 1
-static int ext4_nonda_switch(struct super_block *sb)
+static int ext4_nonda_switch(struct super_block *sb, int writeback_allowed)
  {
  	s64 free_blocks, dirty_blocks;
  	struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -2475,7 +2478,7 @@ static int ext4_nonda_switch(struct super_block *sb)
  	 * Even if we don't switch but are nearing capacity,
  	 * start pushing delalloc when 1/2 of free blocks are dirty.
  	 */
-	if (free_blocks < 2 * dirty_blocks)
+	if (writeback_allowed && free_blocks < 2 * dirty_blocks)
  		writeback_inodes_sb_if_idle(sb, WB_REASON_FS_FREE_SPACE);
  
  	return 0;
@@ -2490,10 +2493,14 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
  	pgoff_t index;
  	struct inode *inode = mapping->host;
  	handle_t *handle;
+	int nofs = flags & AOP_FLAG_NOFS;
  
  	index = pos >> PAGE_CACHE_SHIFT;
+	/* We cannot recurse into the filesystem if the transaction is already
+	 * started */
+	BUG_ON(!nofs && journal_current_handle());
  
-	if (ext4_nonda_switch(inode->i_sb)) {
+	if (ext4_nonda_switch(inode->i_sb, !nofs)) {
  		*fsdata = (void *)FALL_BACK_TO_NONDELALLOC;
  		return ext4_write_begin(file, mapping, pos,
  					len, flags, pagep, fsdata);
@@ -2512,8 +2519,6 @@ retry:
  		ret = PTR_ERR(handle);
  		goto out;
  	}
-	/* We cannot recurse into the filesystem as the transaction is already
-	 * started */
  	flags |= AOP_FLAG_NOFS;
  
  	page = grab_cache_page_write_begin(mapping, index, flags);
@@ -2538,7 +2543,8 @@ retry:
  			ext4_truncate_failed_write(inode);
  	}
  
-	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
+	if (!nofs && ret == -ENOSPC &&
+	    ext4_should_retry_alloc(inode->i_sb, &retries))
  		goto retry;
  out:
  	return ret;
@@ -4791,7 +4797,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
  	/* Delalloc case is easy... */
  	if (test_opt(inode->i_sb, DELALLOC) &&
  	    !ext4_should_journal_data(inode) &&
-	    !ext4_nonda_switch(inode->i_sb)) {
+	    !ext4_nonda_switch(inode->i_sb, 1)) {
  		do {
  			ret = __block_page_mkwrite(vma, vmf,
  						   ext4_da_get_block_prep);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Akira Fujita [off-list ref]

The First Fundamental Software Development Group,
Platform Division, NEC Software Tohoku, Ltd.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help