Re: Kernel crash after using new Intel NIC (igb)

Kernel crash after using new Intel NIC (igb) · Maximilian Engelhardt <hidden> · 2011-04-24
RE: Kernel crash after using new Intel NIC (igb) · Wyborny, Carolyn <hidden> · 2011-04-26
Re: Kernel crash after using new Intel NIC (igb) · Maximilian Engelhardt <hidden> · 2011-04-27
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-04-27
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-04-27
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-04-27
Re: Kernel crash after using new Intel NIC (igb) · Maximilian Engelhardt <hidden> · 2011-04-27
Re: Kernel crash after using new Intel NIC (igb) · Arun Sharma <hidden> · 2011-05-12
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-12
Re: Kernel crash after using new Intel NIC (igb) · Arun Sharma <hidden> · 2011-05-24
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-25
Re: Kernel crash after using new Intel NIC (igb) · Arun Sharma <hidden> · 2011-05-25
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-25
Re: Kernel crash after using new Intel NIC (igb) · Ben Hutchings <hidden> · 2011-05-26
Re: Kernel crash after using new Intel NIC (igb) · Arun Sharma <hidden> · 2011-05-26
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-26
Re: Kernel crash after using new Intel NIC (igb) · Arun Sharma <hidden> · 2011-05-26
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-26
Re: Kernel crash after using new Intel NIC (igb) · Arun Sharma <hidden> · 2011-05-27
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-27
Re: Kernel crash after using new Intel NIC (igb) · Yann Dupont <hidden> · 2011-05-27
Re: Kernel crash after using new Intel NIC (igb) · David Miller <davem@davemloft.net> · 2011-05-27
Re: Kernel crash after using new Intel NIC (igb) · Arun Sharma <hidden> · 2011-05-27
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-27
Re: Kernel crash after using new Intel NIC (igb) · Arun Sharma <hidden> · 2011-05-27
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-28
Re: Kernel crash after using new Intel NIC (igb) · Ingo Molnar <hidden> · 2011-05-28
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-29
Re: Kernel crash after using new Intel NIC (igb) · Ingo Molnar <hidden> · 2011-05-29
Re: Kernel crash after using new Intel NIC (igb) · Eric Dumazet <hidden> · 2011-05-29
Re: Kernel crash after using new Intel NIC (igb) · Ingo Molnar <hidden> · 2011-05-29
Re: Kernel crash after using new Intel NIC (igb) · Arun Sharma <hidden> · 2011-05-30
Re: Kernel crash after using new Intel NIC (igb) · Ingo Molnar <hidden> · 2011-05-31
Re: Kernel crash after using new Intel NIC (igb) · Maximilian Engelhardt <hidden> · 2011-07-13

From: Arun Sharma <hidden>
Date: 2011-05-24 21:33:29
Also in: lkml

On Thu, May 12, 2011 at 11:15:53PM +0200, Eric Dumazet wrote:

Probably not.

What gives slub_nomerge=1   for you ?

It took me a while to get a new kernel on a large enough sample
of machines to get some data.

Like you observed in the other thread, this is unlikely to be a random
memory corruption.

The panics stopped after we moved the list_empty() check under the lock.

--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c

@@ -154,11 +154,11 @@ void __init inet_initpeers(void)
 /* Called with or without local BH being disabled. */
 static void unlink_from_unused(struct inet_peer *p)
 {
+	spin_lock_bh(&unused_peers.lock);
 	if (!list_empty(&p->unused)) {
-		spin_lock_bh(&unused_peers.lock);
 		list_del_init(&p->unused);
-		spin_unlock_bh(&unused_peers.lock);
 	}
+	spin_unlock_bh(&unused_peers.lock);
 }
 
 static int addr_compare(const struct inetpeer_addr *a,

The idea being that the list gets corrupted under some kind of a race
condition. Two threads racing on list_empty() and executing
list_del_init() seems harmless.

There is probably a different race condition that is mitigated by doing
the list_empty() check under the lock.

 -Arun

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help