Possible regression between 4.9 and 4.13
From: Mason <hidden>
Date: 2017-08-31 09:39:39
Also in:
linux-pci
On 30/08/2017 11:06, Greg Kroah-Hartman wrote:
On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote:quoted
On 30/08/2017 08:02, Greg Kroah-Hartman wrote:quoted
To get back to the original issue here, the hardware seems to have died, the driver stops talking to it, and all is good. The "regression" here is that we now properly can determine that the hardware is crap.Before 4.12, when I unplugged my USB3 Flash drive, Linux would detect a few "Uncorrected Non-Fatal errors" via AER, but it was still possible to plug the drive back in. Since 4.12, once I unplug the drive, the whole USB3 card is marked as dead (all 4 ports), and I can no longer plug anything in (not even the USB2 drive that didn't have any issues, IIRC). It seems a bit premature to "mark as dead" something that remains functional, doesn't it?I agree, but if the device sends all ones, it's a good indication it is really dead, right? Or something is wrong with it.
I wouldn't call it dead if I can plug the drive back in, and have it working... But I agree that something fishy is happening...
quoted
Disclaimer, there are many variables in this setup, and I've only tested a small fraction of the problem space: only one system, only one USB3 board, only one USB3 Flash drive.Did you ever happen to narrow this down to a single git commit using 'git bisect'? I can't remember what happened in the beginning of this thread...
Mathias pointed out d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
quoted
quoted
So, how do you think we should proceed, delay a bit longer before saying the device is gone? How long is "long enough"? How many bus errors are we allowed to tolerate (hint, the PCI spec says none...) Maybe someone wants to get to the root problem here, why is the hardware suddenly reporting all 1s?I'm afraid I won't be able to make any progress on this front, unless I can get my hands on a PCIe packet analyzer.Odds of that happening are pretty rare, right? I've never even seen one of those...
I had a "Summit T24 Analyzer" on my desk a few months ago, but I was getting strange results, and the knowledgeable people in my company were not available at the time. http://teledynelecroy.com/protocolanalyzer/protocoloverview.aspx?seriesid=445 Regards.