Thread (14 messages) 14 messages, 3 authors, 2006-11-16

Re: watchdog timeout panic in e1000 driver

From: Kenzo Iwami <hidden>
Date: 2006-10-25 13:41:36

Hi,
quoted
This problem originally occurred in a very large cluster system using snmp
for server management. About two servers panicked each day. The program I sent
is to reproduce this problem in a very short time. It does occur under normal
load when there is a lot of servers.
hmm, not good - does your snmp daemon use ethtool excessively? That would certainly be 
painful to the driver (any driver!).
I only looked at the panic message after this problem occurred.
I could tell that the snmp daemon caused the panic while trying to process
the ethtool's ioctl, but I don't know how often this was called.
However, it shouldn't be excessively called because it occurred on a production
system while it was idle.
Anyway as I said in the same e-mail, we're working on reducing the lock timeout to a 
reasonable time. This will unfortunately take some time, as we need to change some major 
components in the driver to make sure this doesn't happen.
How about the following approach?
If acquiring semaphore fails inside the interrupt handler, acquiring semaphore
is abandoned immediately without waiting for timeout.
However, I don't know whether this method affects other processes.

-- 
  Kenzo Iwami (k-iwami@cj.jp.nec.com)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help