Thread (98 messages) 98 messages, 20 authors, 2011-11-18

Re: Linux 3.1-rc9

From: Yong Zhang <hidden>
Date: 2011-10-26 01:48:12

On Tue, Oct 25, 2011 at 08:26:31AM -0700, Simon Kirby wrote:
On Tue, Oct 18, 2011 at 01:12:41PM -0700, Linus Torvalds wrote:
quoted
On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner [off-list ref] wrote:
quoted
It does not look related.
Yeah, the only lock held there seems to be the socket lock, and it
looks like all CPU's are spinning on it.
quoted
Could you try to reproduce that problem with
lockdep enabled? lockdep might make it go away, but it's definitely
worth a try.
And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering
some odd networking thing.  It sounds unlikely, but maybe some error
case you get into doesn't release the socket lock.

I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping
lock thing is separate, iirc.
I think the config option you were trying to think of is
CONFIG_DEBUG_ATOMIC_SLEEP, which enables CONFIG_PREEMPT_COUNT.

By the way, we got this WARN_ON_ONCE while running lockdep elsewhere:

       /*
        * We can walk the hash lockfree, because the hash only
        * grows, and we are careful when adding entries to the end:
        */
       list_for_each_entry(class, hash_head, hash_entry) {
               if (class->key == key) {
                       WARN_ON_ONCE(class->name != lock->name);
Someone has hit this before, maybe you can try the patch in:
http://marc.info/?l=linux-kernel&m=131919035525533

Thanks,
Yong
                       return class;
               }
       }

[19274.691090] ------------[ cut here ]------------
[19274.691107] WARNING: at kernel/lockdep.c:690 __lock_acquire+0xfd6/0x2180()
[19274.691112] Hardware name: PowerEdge 2950
[19274.691115] Modules linked in: drbd lru_cache cn ipmi_devintf ipmi_si ipmi_msghandler sata_sil24 bnx2
[19274.691137] Pid: 4416, comm: heartbeat Not tainted 3.1.0-hw-lockdep+ #52
[19274.691141] Call Trace:
[19274.691149]  [<ffffffff81098f96>] ? __lock_acquire+0xfd6/0x2180
[19274.691156]  [<ffffffff8105c4f0>] warn_slowpath_common+0x80/0xc0
[19274.691163]  [<ffffffff8105c545>] warn_slowpath_null+0x15/0x20
[19274.691169]  [<ffffffff81098f96>] __lock_acquire+0xfd6/0x2180
[19274.691175]  [<ffffffff8109a2e9>] ? lock_release_non_nested+0x1a9/0x340
[19274.691181]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[19274.691185]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691191]  [<ffffffff813a4f8a>] ? __delay+0xa/0x10
[19274.691197]  [<ffffffff816f55fa>] _raw_spin_lock_nested+0x3a/0x50
[19274.691201]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691205]  [<ffffffff8104a302>] double_rq_lock+0x52/0x80
[19274.691210]  [<ffffffff81058167>] load_balance+0x897/0x16e0
[19274.691215]  [<ffffffff81058199>] ? load_balance+0x8c9/0x16e0
[19274.691219]  [<ffffffff8104d172>] ? update_shares+0xd2/0x150
[19274.691226]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691232]  [<ffffffff816f2608>] __schedule+0x8d8/0xa20
[19274.691238]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691243]  [<ffffffff81063e87>] ? local_bh_enable+0xa7/0x110
[19274.691249]  [<ffffffff8169c008>] ? unix_stream_recvmsg+0x1d8/0x7f0
[19274.691254]  [<ffffffff81614c88>] ? dev_queue_xmit+0x1a8/0x8a0
[19274.691258]  [<ffffffff816f282a>] schedule+0x3a/0x60
[19274.691265]  [<ffffffff816f4515>] schedule_hrtimeout_range_clock+0x105/0x120
[19274.691270]  [<ffffffff81096c9d>] ? trace_hardirqs_on+0xd/0x10
[19274.691276]  [<ffffffff81080d89>] ? add_wait_queue+0x49/0x60
[19274.691282]  [<ffffffff816f453e>] schedule_hrtimeout_range+0xe/0x10
[19274.691291]  [<ffffffff8113dc04>] poll_schedule_timeout+0x44/0x70
[19274.691297]  [<ffffffff8113e29c>] do_sys_poll+0x33c/0x4f0
[19274.691303]  [<ffffffff8113dcf0>] ? poll_freewait+0xc0/0xc0
[19274.691309]  [<ffffffff8113ddf0>] ? __pollwait+0x100/0x100
[19274.691317]  [<ffffffff81602c3d>] ? sock_update_classid+0xfd/0x140
[19274.691323]  [<ffffffff81602bb0>] ? sock_update_classid+0x70/0x140
[19274.691330]  [<ffffffff815ff1f7>] ? sock_recvmsg+0xf7/0x130
[19274.691336]  [<ffffffff81098450>] ? __lock_acquire+0x490/0x2180
[19274.691343]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691351]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10
[19274.691356]  [<ffffffff810944cd>] ? trace_hardirqs_off+0xd/0x10
[19274.691363]  [<ffffffff815ffb0b>] ? sys_recvfrom+0xbb/0x120
[19274.691370]  [<ffffffff81082540>] ? process_cpu_clock_getres+0x10/0x10
[19274.691376]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691383]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691390]  [<ffffffff816fe0ca>] ? sysret_check+0x2e/0x69
[19274.691396]  [<ffffffff8113e647>] sys_poll+0x77/0x110
[19274.691402]  [<ffffffff816fe092>] system_call_fastpath+0x16/0x1b
[19274.691407] ---[ end trace 74fbaae9066aadcc ]---

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-- 
Only stand for myself
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help