Re: "I/O 8 QID 0 timeout, reset controller" on 5.6-rc2
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: 2020-03-04 11:19:48
Also in:
linux-nvme
On Wed, Mar 4, 2020 at 6:02 AM Keith Busch [off-list ref] wrote:
On Mon, Mar 02, 2020 at 10:03:39AM +0800, Jason A. Donenfeld wrote:quoted
Hi, My torrent client was doing some I/O when the below happened. I'm wondering if this is a known thing that's been fixed during the rc cycle, a regression, or if my (pretty new) NVMe drive is failing. Thanks, Jason Feb 24 20:36:58 thinkpad kernel: nvme nvme1: I/O 852 QID 15 timeout, aborting Feb 24 20:37:29 thinkpad kernel: nvme nvme1: I/O 852 QID 15 timeout, reset controller Feb 24 20:37:59 thinkpad kernel: nvme nvme1: I/O 8 QID 0 timeout, reset controller Feb 24 20:39:00 thinkpad kernel: nvme nvme1: Device not ready; aborting reset Feb 24 20:39:00 thinkpad kernel: nvme nvme1: Abort status: 0x371Sorry to say, this indicates the controller has become unresponsive. You usually see "timeout" messages in batches, though, so I wonder if only the one IO command timed out or if the controller just doesn't support an abort command limit. You can try throttling the queue depth and see if the problem goes away. The lowest possible depth can be set with kernel param "nvme.io_queue_depth=2".
I was unfortunately never able to reproduce. This happened while downloading a torrent, and torrent clients have a history of creating "interesting" I/O patterns. Hardware is "Samsung SSD 970 EVO Plus 2TB".