Thread (34 messages) 34 messages, 8 authors, 2011-07-13

Re: Kernel crash after using new Intel NIC (igb)

From: Arun Sharma <hidden>
Date: 2011-05-26 19:29:45
Also in: lkml

On 5/24/11 11:35 PM, Eric Dumazet wrote:
quoted
Another possibility is to do the list_empty() check twice. Once without
taking the lock and again with the spinlock held.
Why ?
Part of the problem is that I don't have a precise understanding of the 
race condition that's causing the list to become corrupted.

All I know is that doing it under the lock fixes it. If it's slowing 
things down, we do a check outside the lock (since it's cheap). But if 
we get the wrong answer, we verify it again under the lock.
list_del_init(&p->unused); (done under lock of course) is safe, you can
call it twice, no problem.
Doing it twice is not a problem. But doing it when we shouldn't be doing 
it could be the problem.

The list modification under unused_peers.lock looks generally safe. But 
the control flow (based on refcnt) done outside the lock might have races.

Eg: inet_putpeer() might find the refcnt go to zero, but before it adds 
it to the unused list, another thread may be doing inet_getpeer() and 
set refcnt to 1. In the end, we end up with a node that's potentially in 
use, but ends up on the unused list.

  -Arun
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help