Thread (39 messages) 39 messages, 4 authors, 2012-07-17

Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!

From: Chris Mason <hidden>
Date: 2012-07-16 15:43:04
Also in: linux-fsdevel, lkml

On Mon, Jul 16, 2012 at 04:55:44AM -0600, Mike Galbraith wrote:
On Sat, 2012-07-14 at 12:14 +0200, Mike Galbraith wrote: 
quoted
On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
quoted
On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
quoted
Greetings,
[ deadlocks with btrfs and the recent RT kernels ]

I talked with Thomas about this and I think the problem is the
single-reader nature of the RW rwlocks.  The lockdep report below
mentions that btrfs is calling:
quoted
[  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70
In this case, the task has a number of blocking read locks on the btrfs buffers,
and we're trying to turn them back into spinning read locks.  Even
though btrfs is taking the read rwlock, it doesn't think of this as a new
lock operation because we were blocking out new writers.

If the second task has taken the spinning read lock, it is going to
prevent that clear_path_blocking operation from progressing, even though
it would have worked on a non-RT kernel.

The solution should be to make the blocking read locks in btrfs honor the
single-reader semantics.  This means not allowing more than one blocking
reader and not allowing a spinning reader when there is a blocking
reader.  Strictly speaking btrfs shouldn't need recursive readers on a
single lock, so I wouldn't worry about that part.

There is also a chunk of code in btrfs_clear_path_blocking that makes
sure to strictly honor top down locking order during the conversion.  It
only does this when lockdep is enabled because in non-RT kernels we
don't need to worry about it.  For RT we'll want to enable that as well.

I'll give this a shot later today.
I took a poke at it.  Did I do something similar to what you had in
mind, or just hide behind performance stealing paranoid trylock loops?
Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
bat, so it gets posted despite skepticism.
Seems btrfs isn't entirely convinced either.

[ 2292.336229] use_block_rsv: 1810 callbacks suppressed
[ 2292.336231] ------------[ cut here ]------------
[ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 use_block_rsv+0x17d/0x190 [btrfs]()
[ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]-
[ 2292.336259] btrfs: block rsv returned -28
This is unrelated.  You got far enough into the benchmark to hit an
ENOSPC warning.  This can be ignored (I just deleted it when we used 3.0
for oracle).

re: dbench performance.  dbench tends to penalize fairness.  I can
imagine RT making it slower in general.

It also triggers lots of lock contention in btrfs because the dataset is
fairly small and the trees don't fan out a lot.

-chris
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help