Thread (21 messages) 21 messages, 3 authors, 2006-09-12

Re: pci error recovery procedure

From: Zhang, Yanmin <hidden>
Date: 2006-09-07 01:58:14
Also in: lkml

On Thu, 2006-09-07 at 04:01, Linas Vepstas wrote:
On Wed, Sep 06, 2006 at 09:26:56AM +0800, Zhang, Yanmin wrote:
quoted
quoted
quoted
The
error_detected of the drivers in the latest kernel who support err handlers
always returns PCI_ERS_RESULT_NEED_RESET. They are typical examples.
Just because the current drivers do it this way does not mean that this is
the best way to do things.
If it's not the best way, why did you choose to reset slot for e1000/e100/ipr
error handlers? They are typical widely-used devices. To make it easier to
add error handlers?
I did it that way just to get going, get something working. I do not
have hardware specs for any of these devices, and do not have much of 
an idea of what they are capable of;
Yes, it's difficult to add fine-grained error handlers for guys who are not
the driver developers.
 the recovery code I wrote is of
"brute force, hit it with a hammer"-nature.  Driver writers who 
know thier hardware well, and are interested in a more refined 
approach are encouraged to actualy use a more refined approach.
I guess almost no driver developer is happy to spend lots of time to
add refined steps. They would like to focus on normal process (for achievement
feeling? :) ).
In addition, if they use fine-grained steps in error handlers, all these
steps might be rewritten when the device specs is upgraded. Fine-grained steps in
error handlers are more difficut to debug.

It's impossible for you to develop error handlers for all device drivers.

The error handlers look a little like suspend/resume. Of course, it's more
complicated. If we could keep it as simple as suspend/resume, it's more welcomed.

pci error shouldn't happen frequently. And when it happens, I think mostly it's
an endpoint device instead of bridge. When it happens, if we choose always
reset slot, performance could be degraded, but not too much. I just deduce, and 
didn't test it on a machine with hundreds of devices.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help