Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
From: Mike Galbraith <hidden>
Date: 2012-07-13 10:14:45
Also in:
linux-fsdevel, lkml
On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote:
On Fri, 13 Jul 2012, Mike Galbraith wrote:quoted
On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote:quoted
Bingo, that makes it more likely that this is caused by copying w/o initializing the lock and then freeing the original structure. A quick check for memcpy finds that __btrfs_close_devices() does a memcpy of btrfs_device structs w/o initializing the lock in the new copy, but I have no idea whether that's the place we are looking for.Thanks a bunch Thomas. I doubt I would have ever figured out that lala land resulted from _copying_ a lock. That's one I won't be forgetting any time soon. Box not only survived a few thousand xfstests 006 runs, dbench seemed disinterested in deadlocking virgin 3.0-rt.Cute. It think that the lock copying caused the deadlock problem as the list pointed to the wrong place, so we might have ended up with following down the wrong chain when walking the list as long as the original struct was not freed. That beast is freed under RCU so there could be a rcu read side critical section fiddling with the old lock and cause utter confusion.
Virgin 3.0-rt appears to really be solid. But then it doesn't have pesky rwlocks.
/me goes and writes a nastigram^W proper changelogquoted
btrfs still locks up in my enterprise kernel, so I suppose I had better plug your fix into 3.4-rt and see what happens, and go beat hell out of virgin 3.0-rt again to be sure box really really survives dbench.A test against 3.4-rt sans enterprise mess might be nice as well.
Enterprise is 3.0-stable with um 555 btrfs patches (oh dear). Virgin 3.4-rt and 3.2-rt deadlock gripe. Enterprise doesn't gripe, but deadlocks, so I have another adventure in my future even if I figure out wth to do about rwlocks. -Mike