Re: Linux 3.1-rc9
From: Yong Zhang <hidden>
Date: 2011-10-26 01:48:12
On Tue, Oct 25, 2011 at 08:26:31AM -0700, Simon Kirby wrote:
On Tue, Oct 18, 2011 at 01:12:41PM -0700, Linus Torvalds wrote:quoted
On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner [off-list ref] wrote:quoted
It does not look related.Yeah, the only lock held there seems to be the socket lock, and it looks like all CPU's are spinning on it.quoted
Could you try to reproduce that problem with lockdep enabled? lockdep might make it go away, but it's definitely worth a try.And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering some odd networking thing. It sounds unlikely, but maybe some error case you get into doesn't release the socket lock. I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping lock thing is separate, iirc.I think the config option you were trying to think of is CONFIG_DEBUG_ATOMIC_SLEEP, which enables CONFIG_PREEMPT_COUNT. By the way, we got this WARN_ON_ONCE while running lockdep elsewhere: /* * We can walk the hash lockfree, because the hash only * grows, and we are careful when adding entries to the end: */ list_for_each_entry(class, hash_head, hash_entry) { if (class->key == key) { WARN_ON_ONCE(class->name != lock->name);
Someone has hit this before, maybe you can try the patch in: http://marc.info/?l=linux-kernel&m=131919035525533 Thanks, Yong
return class;
}
}
[19274.691090] ------------[ cut here ]------------
[19274.691107] WARNING: at kernel/lockdep.c:690 __lock_acquire+0xfd6/0x2180()
[19274.691112] Hardware name: PowerEdge 2950
[19274.691115] Modules linked in: drbd lru_cache cn ipmi_devintf ipmi_si ipmi_msghandler sata_sil24 bnx2
[19274.691137] Pid: 4416, comm: heartbeat Not tainted 3.1.0-hw-lockdep+ #52
[19274.691141] Call Trace:
[19274.691149] [<ffffffff81098f96>] ? __lock_acquire+0xfd6/0x2180
[19274.691156] [<ffffffff8105c4f0>] warn_slowpath_common+0x80/0xc0
[19274.691163] [<ffffffff8105c545>] warn_slowpath_null+0x15/0x20
[19274.691169] [<ffffffff81098f96>] __lock_acquire+0xfd6/0x2180
[19274.691175] [<ffffffff8109a2e9>] ? lock_release_non_nested+0x1a9/0x340
[19274.691181] [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[19274.691185] [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691191] [<ffffffff813a4f8a>] ? __delay+0xa/0x10
[19274.691197] [<ffffffff816f55fa>] _raw_spin_lock_nested+0x3a/0x50
[19274.691201] [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691205] [<ffffffff8104a302>] double_rq_lock+0x52/0x80
[19274.691210] [<ffffffff81058167>] load_balance+0x897/0x16e0
[19274.691215] [<ffffffff81058199>] ? load_balance+0x8c9/0x16e0
[19274.691219] [<ffffffff8104d172>] ? update_shares+0xd2/0x150
[19274.691226] [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691232] [<ffffffff816f2608>] __schedule+0x8d8/0xa20
[19274.691238] [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691243] [<ffffffff81063e87>] ? local_bh_enable+0xa7/0x110
[19274.691249] [<ffffffff8169c008>] ? unix_stream_recvmsg+0x1d8/0x7f0
[19274.691254] [<ffffffff81614c88>] ? dev_queue_xmit+0x1a8/0x8a0
[19274.691258] [<ffffffff816f282a>] schedule+0x3a/0x60
[19274.691265] [<ffffffff816f4515>] schedule_hrtimeout_range_clock+0x105/0x120
[19274.691270] [<ffffffff81096c9d>] ? trace_hardirqs_on+0xd/0x10
[19274.691276] [<ffffffff81080d89>] ? add_wait_queue+0x49/0x60
[19274.691282] [<ffffffff816f453e>] schedule_hrtimeout_range+0xe/0x10
[19274.691291] [<ffffffff8113dc04>] poll_schedule_timeout+0x44/0x70
[19274.691297] [<ffffffff8113e29c>] do_sys_poll+0x33c/0x4f0
[19274.691303] [<ffffffff8113dcf0>] ? poll_freewait+0xc0/0xc0
[19274.691309] [<ffffffff8113ddf0>] ? __pollwait+0x100/0x100
[19274.691317] [<ffffffff81602c3d>] ? sock_update_classid+0xfd/0x140
[19274.691323] [<ffffffff81602bb0>] ? sock_update_classid+0x70/0x140
[19274.691330] [<ffffffff815ff1f7>] ? sock_recvmsg+0xf7/0x130
[19274.691336] [<ffffffff81098450>] ? __lock_acquire+0x490/0x2180
[19274.691343] [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691351] [<ffffffff8101a129>] ? sched_clock+0x9/0x10
[19274.691356] [<ffffffff810944cd>] ? trace_hardirqs_off+0xd/0x10
[19274.691363] [<ffffffff815ffb0b>] ? sys_recvfrom+0xbb/0x120
[19274.691370] [<ffffffff81082540>] ? process_cpu_clock_getres+0x10/0x10
[19274.691376] [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691383] [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691390] [<ffffffff816fe0ca>] ? sysret_check+0x2e/0x69
[19274.691396] [<ffffffff8113e647>] sys_poll+0x77/0x110
[19274.691402] [<ffffffff816fe092>] system_call_fastpath+0x16/0x1b
[19274.691407] ---[ end trace 74fbaae9066aadcc ]---
Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/-- Only stand for myself