Re: Kernel crash after using new Intel NIC (igb)
From: Arun Sharma <hidden>
Date: 2011-05-24 21:33:29
Also in:
lkml
On Thu, May 12, 2011 at 11:15:53PM +0200, Eric Dumazet wrote:
Probably not. What gives slub_nomerge=1 for you ?
It took me a while to get a new kernel on a large enough sample of machines to get some data. Like you observed in the other thread, this is unlikely to be a random memory corruption. The panics stopped after we moved the list_empty() check under the lock.
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c@@ -154,11 +154,11 @@ void __init inet_initpeers(void) /* Called with or without local BH being disabled. */ static void unlink_from_unused(struct inet_peer *p) { + spin_lock_bh(&unused_peers.lock); if (!list_empty(&p->unused)) { - spin_lock_bh(&unused_peers.lock); list_del_init(&p->unused); - spin_unlock_bh(&unused_peers.lock); } + spin_unlock_bh(&unused_peers.lock); } static int addr_compare(const struct inetpeer_addr *a,
The idea being that the list gets corrupted under some kind of a race condition. Two threads racing on list_empty() and executing list_del_init() seems harmless. There is probably a different race condition that is mitigated by doing the list_empty() check under the lock. -Arun