Thread (5 messages) 5 messages, 3 authors, 2007-02-21

Re: watchdog timeout panic in e1000 driver

From: Auke Kok <hidden>
Date: 2007-02-20 16:10:57

Kenzo Iwami wrote:
Hi,

I created a patch that uses watchdog_task but fixes the race condition
that occurred in old the e1000 driver.

I've obtained information about the panic caused by the old e1000 driver
using e1000_watchdog_task. According to the crash dump, the panic was
caused by a timer_list whose contents were NULLs. Further trace
information revealed that the function in the timer list was
e1000_watchdog().

This function is registered in timer_list during e1000_watchdog_task.
It seems that e1000_watchdog_task could be called after the adapter is
removed, and freed memory is registered to timer_list.

By looking at the source code, e1000_watchdog_task will be scheduled if
e1000_watchdog is invoked during e1000_remove, after flush_scheduled_work()
is called, but before del_timer_sync() is called in e1000_down().

The attached patch adds back the e1000_watchdog_task(), but it will
prevent the old race condition from happening by deleting e1000_watchdog
from timer_list before flush_scheduled_work() is called.

Kenzo,

this looks a lot better than the previous patch!! However, we already have a 
state marker for _down_ that we should probably reuse. Can you try the attached 
patch and see if it works for you? It's basically your patch without the added 
remove flag and instead using the already available atomic state trackers.

If this works for you then that is great news and I'll push this patch to the 
upstream kernel maintainers after testing.

Cheers,

Auke


Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help