Thread (14 messages) 14 messages, 3 authors, 2006-11-16

Re: watchdog timeout panic in e1000 driver

From: Kenzo Iwami <hidden>
Date: 2006-11-01 13:21:45

Hi,
quoted
quoted
Even if the total lock time can be reduced, it's possible that interrupt
handler is executed while the interrupted code is still holding the 
semaphore.
I think your method only decrease the frequency of this problem.
Why does reducing the lock time solve this problem?
there are several problems here that need addressing. It's not acceptable 
for our driver to wait up to 15 seconds, and we can (presumably) reduce it 
to milliseconds, so that would help a lot. We should in no case at all hold 
it for any period longer than (give or take) half a second, so working 
towards that is a very good step in the right direction.

Adding the timer task back may also help, as we are no longer trying to 
aqcuire the sw_fw_semaphore in interrupt context, but we removed it for a 
reason, and I need to dig up what reason this exactly was before we can 
revert it. Jesse might know, so I'll talk to him. But this will not fix the 
fact that the semaphore is held for a long time :)
Timer tasks that reschedule themselves are a pain.  The watchdog timer task
had a couple of race conditions that were thought to be better fixed by
removing it all together.  Please, let's not go down that road again!
I understand that the watchdog_task could cause a race when the timer task
and e1000_down runs concurrently, resulting in memory double free.

I think this problem occurs because interrupt handler is executed in same
CPU as process that acquires semaphore.
How about disabling interrupt while the process is holding the semaphore?
I think this is possible, if the total lock time has been reduced.

-- 
  Kenzi Iwami (k-iwami@cj.jp.nec.com)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help