Possible regression between 4.9 and 4.13
From: Mason <hidden>
Date: 2017-08-30 08:55:37
Also in:
linux-pci
On 30/08/2017 08:02, Greg Kroah-Hartman wrote:
To get back to the original issue here, the hardware seems to have died, the driver stops talking to it, and all is good. The "regression" here is that we now properly can determine that the hardware is crap.
Before 4.12, when I unplugged my USB3 Flash drive, Linux would detect a few "Uncorrected Non-Fatal errors" via AER, but it was still possible to plug the drive back in. Since 4.12, once I unplug the drive, the whole USB3 card is marked as dead (all 4 ports), and I can no longer plug anything in (not even the USB2 drive that didn't have any issues, IIRC). It seems a bit premature to "mark as dead" something that remains functional, doesn't it? Disclaimer, there are many variables in this setup, and I've only tested a small fraction of the problem space: only one system, only one USB3 board, only one USB3 Flash drive.
So, how do you think we should proceed, delay a bit longer before saying the device is gone? How long is "long enough"? How many bus errors are we allowed to tolerate (hint, the PCI spec says none...) Maybe someone wants to get to the root problem here, why is the hardware suddenly reporting all 1s?
I'm afraid I won't be able to make any progress on this front, unless I can get my hands on a PCIe packet analyzer. Regards.