Thread (5 messages) 5 messages, 2 authors, 2011-02-17

Re: [2.6.32 ubuntu] I/O hang at start_this_handle

From: Jan Kara <jack@suse.cz>
Date: 2011-02-17 15:38:50
Also in: linux-fsdevel

  Hello,

On Thu 17-02-11 17:13:43, Tetsuo Handa wrote:
Jan Kara wrote:
quoted
You can verify this by looking at disassembly of start_this_handle() in your
kernel and finding out where offset 0x22d is in the function...
I confirmed that the function

  [<c02c7ead>] start_this_handle+0x22d/0x390

is the one in fs/jbd/transaction.o .

c02c7ea4:       eb 07                   jmp    c02c7ead <start_this_handle+0x22d>
c02c7ea6:       66 90                   xchg   %ax,%ax
c02c7ea8:       e8 93 a6 2e 00          call   c05b2540 <schedule>
c02c7ead:       89 d8                   mov    %ebx,%eax
c02c7eaf:       b9 02 00 00 00          mov    $0x2,%ecx
c02c7eb4:       8d 55 e0                lea    -0x20(%ebp),%edx
c02c7eb7:       e8 d4 82 ea ff          call   c0170190 <prepare_to_wait>
c02c7ebc:       8b 46 18                mov    0x18(%esi),%eax
c02c7ebf:       85 c0                   test   %eax,%eax
c02c7ec1:       75 e5                   jne    c02c7ea8 <start_this_handle+0x228>
c02c7ec3:       8b 45 cc                mov    -0x34(%ebp),%eax
c02c7ec6:       8d 55 e0                lea    -0x20(%ebp),%edx
c02c7ec9:       e8 e2 81 ea ff          call   c01700b0 <finish_wait>
c02c7ece:       e9 08 fe ff ff          jmp    c02c7cdb <start_this_handle+0x5b>

The location in that function is

        /* Wait on the journal's transaction barrier if necessary */
        if (journal->j_barrier_count) {
                spin_unlock(&journal->j_state_lock);
                wait_event(journal->j_wait_transaction_locked,
                                journal->j_barrier_count == 0);
                goto repeat;
        }

. (Disassembly with mixed code attached at the bottom.)
  Great, thanks for analysis.
quoted
But in this case - does the process (sh) eventually resume or is it stuck
forever?
I waited for a few hours but the process did not resume. Thus, I gave up.
OK, so stuck forever ;). Interesting. So we probably missed a wakeup
somehow or j_barrier_count got corrupted. I suppose you are not able to
reproduce the hang, are you?  Looking at the code, it looks safe and I have
no clue how it could happen. So unless you are able to see the issue again
(so that we can gather some more debug information), I'm not able to help...
I'm sorry.

								Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help