Thread (33 messages) 33 messages, 6 authors, 2008-08-21

Re: [PATCH] jbd jbd2: fix dio write returning EIO when try_to_release_page fails

From: Mingming Cao <hidden>
Date: 2008-08-05 21:03:14
Also in: linux-fsdevel

在 2008-08-04一的 20:10 +0900,Hisashi Hifumi写道:
Hi

Dio write returns EIO when try_to_release_page fails because bh is
still referenced.
The patch 
"commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91
Author: Mingming Cao [off-list ref]
Date:   Fri Jul 25 01:46:22 2008 -0700

    jbd: fix race between free buffer and commit transaction
" 
was merged into 2.6.27-rc1, but I noticed that this patch is not enough
to fix the race.
I did fsstress test heavily to 2.6.27-rc1, and found that dio write still 
sometimes got EIO through this test.
:(  thought we beat that race pretty hard already.T

Could you send me the fsstree command to reproduce the race?
The patch above fixed race between freeing buffer(dio) and committing 
transaction(jbd) but I discovered that there is another race, 
freeing buffer(dio) and ext3/4_ordered_writepage.
: background_writeout()
     ->write_cache_pages()
       ->ext3_ordered_writepage()
     	   walk_page_buffers() <- take a bh ref
 	   block_write_full_page() <- unlock_page
		: <- end_page_writeback
                : <- race! (dio write->try_to_release_page fails)
      	   walk_page_buffers() <-release a bh ref

ext3_ordered_writepage holds bh ref and does unlock_page remaining 
taking a bh ref, so this causes the race and failure of 
try_to_release_page.
I thought about this before, the race seems unlikely to me. Perhaps I
missed something, but DIO code already waiting for all the pending IO to
finish before calling try_to_release_page which eventually called
journal_try_to_free_buffers(). During this call, the inode mutx is hold
to prevent the new writer (buffered/DIO) to re-dirty the pages. If there
is background writeout happens when DIO is kicked in, DIO will wait for
all the pages writeback bit clear first. here is the stack

generic_file_aio_write()
  -> mutex_lock(&inode->i_mutex);
  -> __generic_file_aio_write_nolock()
     -> generic_file_direct_IO()
        ->filemap_write_and_wait()
           -> filemap_fdatawait()
              -> wait_on_page_writeback_range()
                                                (==== waiting for
pending IO to finish ====)
      ->invalidate_inode_pages2_range()
          ->invalidate_inode_pages2()
             ->try_to_releasepage()
                ->ext3_releasepage()
                    ->journal_try_to_free_buffers()
quoted hunk ↗ jump to hunk
Following patch fixes this race.
Thanks.

Signed-off-by :Hisashi Hifumi [off-list ref]

diff -Nrup linux-2.6.27-rc1.org/fs/jbd/transaction.c linux-2.6.27-rc1/fs/jbd/transaction.c
--- linux-2.6.27-rc1.org/fs/jbd/transaction.c	2008-07-29 19:28:47.000000000 +0900
+++ linux-2.6.27-rc1/fs/jbd/transaction.c	2008-07-29 20:40:12.000000000 +0900
@@ -1764,6 +1764,12 @@ int journal_try_to_free_buffers(journal_
 	*/
 	if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) {
 		journal_wait_for_transaction_sync_data(journal);
+
+		bh = head;
+		do {
+			while (atomic_read(&bh->b_count))
+				schedule();
+		} while ((bh = bh->b_this_page) != head);
 		ret = try_to_free_buffers(page);
 	}
diff -Nrup linux-2.6.27-rc1.org/fs/jbd2/transaction.c linux-2.6.27-rc1/fs/jbd2/transaction.c
--- linux-2.6.27-rc1.org/fs/jbd2/transaction.c	2008-07-29 19:28:47.000000000 +0900
+++ linux-2.6.27-rc1/fs/jbd2/transaction.c	2008-07-29 20:56:42.000000000 +0900
@@ -1583,6 +1583,12 @@ int jbd2_journal_try_to_free_buffers(jou
 	*/
 	if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) {
 		jbd2_journal_wait_for_transaction_sync_data(journal);
+
+		bh = head;
+		do {
+			while (atomic_read(&bh->b_count))
+				schedule();
+		} while ((bh = bh->b_this_page) != head);
 		ret = try_to_free_buffers(page);
 	}


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help