Thread (14 messages) 14 messages, 3 authors, 2006-11-16

Re: watchdog timeout panic in e1000 driver

From: Kenzo Iwami <hidden>
Date: 2006-11-16 09:23:27

Possibly related (same subject, not in this thread)

Hi,

Thank you for your comment.
quoted
quoted
I think this problem occurs because interrupt handler is executed in same
CPU as process that acquires semaphore.
How about disabling interrupt while the process is holding the semaphore?
I think this is possible, if the total lock time has been reduced.
I created the attached patch based on the method described above.
This patch disables interrupt while the process is holding the semaphore.
[...]
I'm not sure why you would have to disable interrupts when freeing the semaphore, but 
more importantly I don't want to introduce irq code in the swfw handling functions.

Since the major (only) user running this piece of code in intterupt context is the 
watchdog, we might as well see if we can only disable interrupts for that code path, 
which would only be once per 2 seconds. We don't need to protect the ethtool path into 
this code as it doesn't run in irq context.

Would you mind giving attached patch a try and let me know if it works for you? It will 
disable irqs for a bit longer time than your patch, and it begs for a special 
check-link-in-watchdog function that doesn't take so damn long :(
I tried your patch, but the system panicked with the same symptom.

This problem occurs because e1000_watchdog is called from the interrupt
handler while ethtool processing is holding the semaphore.

  ethtool processing holding semaphore
    INTERRUPT
      e1000_watchdog waits for semaphore to be released

The semaphore e1000_watchdog is waiting for can only be released when
ethtool resumes from interrupt after e1000_watchdog finishes (basically
a deadlock)

In order to solve this problem, interrupts has to be disabled on the
interrupted side (during ethtool processing) and not during
e1000_watchdog within the interrupt handler.

e1000_get_hw_eeprom_semaphore is called from both the interrupt level
and the normal level, and needs to be protected by irq code. The reason
the patch disables interrupts when freeing the semaphore is because
e1000_swfw_sync_release also calls e1000_get_hw_eeprom_semaphore.

--
  Kenzo Iwami (k-iwami@cj.jp.nec.com)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help