Re: [PATCH] jbd jbd2: fix dio write returning EIOwhentry_to_release_page fails
From: Hisashi Hifumi <hidden>
Date: 2008-08-06 06:58:53
Also in:
linux-fsdevel
quoted
quoted
quoted
quoted
quoted
diff -Nrup linux-2.6.27-rc1.org/fs/jbd/transaction.clinux-2.6.27-rc1/fs/jbd/transaction.cquoted
quoted
--- linux-2.6.27-rc1.org/fs/jbd/transaction.c 2008-07-2919:28:47.000000000 +0900quoted
quoted
+++ linux-2.6.27-rc1/fs/jbd/transaction.c 2008-07-2920:40:12.000000000 +0900quoted
quoted
quoted
quoted
quoted
@@ -1764,6 +1764,12 @@ int journal_try_to_free_buffers(journal_ */ if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) { journal_wait_for_transaction_sync_data(journal); + + bh = head; + do { + while (atomic_read(&bh->b_count)) + schedule(); + } while ((bh = bh->b_this_page) != head); ret = try_to_free_buffers(page); }The loop is problematic. If the scheduler decides to keep running this task then we have a busy loop. If this task has realtime policy then it might even lock up the kernel.ocfs2 calls journal_try_to_free_buffers too, looping on b_count might not be the best idea there either. This code gets called from releasepage, which is used other places than the O_DIRECT invalidation paths, I'd be worried about performance problems here.try_to_release_page has gfp_mask parameter. So when try_to_releasepage is called from performance sensitive part, gfp_mask should not be set. b_count check loop is inside of (gfp_mask & __GFP_WAIT) && (gfp_mask &__GFP_FS) check.quoted
Looks like try_to_free_pages will go into releasepage with wait & fs both set. This kind of change would make me very nervous.Hi Chris, The gfp_mask try_to_free_pages() takes from it's caller will past it down to try_to_release_page(). Based on the meaning of __GFP_WAIT and GFP_FS, if the upper level caller set these two flags, I assume the upper level caller expect delay and wait for fs to finish? But I agree that using a loop in journal_try_to_free_buffers() to wait for the busy bh release the counter is expensive...
I modified my patch. I do not change Checking b_count in a loop, but introduce set_current_state(TASK_UNINTERRUPTIBLE) to mitigate the loop. I think this can lead to avoid busy loop. I used the same approach of do_sync_read()->wait_on_retry_sync_kiocb or some drivers(qla2xxx). Signed-off-by: Hisashi Hifumi <redacted> diff -Nrup linux-2.6.27-rc1.org/fs/jbd/transaction.c linux-2.6.27-rc1.jbdfix/fs/jbd/transaction.c
--- linux-2.6.27-rc1.org/fs/jbd/transaction.c 2008-07-29 19:28:47.000000000 +0900
+++ linux-2.6.27-rc1.jbdfix/fs/jbd/transaction.c 2008-08-06 13:35:37.000000000 +0900@@ -1764,6 +1764,15 @@ int journal_try_to_free_buffers(journal_ */ if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) { journal_wait_for_transaction_sync_data(journal); + + bh = head; + do { + while (atomic_read(&bh->b_count)) { + set_current_state(TASK_UNINTERRUPTIBLE); + schedule(); + __set_current_state(TASK_RUNNING); + } + } while ((bh = bh->b_this_page) != head); ret = try_to_free_buffers(page); }