Thread (34 messages) 34 messages, 8 authors, 2011-07-13

Re: Kernel crash after using new Intel NIC (igb)

From: Arun Sharma <hidden>
Date: 2011-05-24 21:33:29
Also in: lkml

On Thu, May 12, 2011 at 11:15:53PM +0200, Eric Dumazet wrote:
Probably not.

What gives slub_nomerge=1   for you ?
It took me a while to get a new kernel on a large enough sample
of machines to get some data.

Like you observed in the other thread, this is unlikely to be a random
memory corruption.

The panics stopped after we moved the list_empty() check under the lock.
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -154,11 +154,11 @@ void __init inet_initpeers(void)
 /* Called with or without local BH being disabled. */
 static void unlink_from_unused(struct inet_peer *p)
 {
+	spin_lock_bh(&unused_peers.lock);
 	if (!list_empty(&p->unused)) {
-		spin_lock_bh(&unused_peers.lock);
 		list_del_init(&p->unused);
-		spin_unlock_bh(&unused_peers.lock);
 	}
+	spin_unlock_bh(&unused_peers.lock);
 }
 
 static int addr_compare(const struct inetpeer_addr *a,
The idea being that the list gets corrupted under some kind of a race
condition. Two threads racing on list_empty() and executing
list_del_init() seems harmless.

There is probably a different race condition that is mitigated by doing
the list_empty() check under the lock.

 -Arun
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help