Re: [bug report] NVMe/IB: reset_controller need more than 1min
From: Yi Zhang <hidden>
Date: 2021-12-11 03:02:23
Also in:
linux-rdma
On Fri, Jun 25, 2021 at 12:14 AM Yi Zhang [off-list ref] wrote:
On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg [off-list ref] wrote:quoted
quoted
Hello Gentle ping here, this issue still exists on latest 5.13-rc7 # time nvme reset /dev/nvme0 real 0m12.636s user 0m0.002s sys 0m0.005s # time nvme reset /dev/nvme0 real 0m12.641s user 0m0.000s sys 0m0.007sStrange that even normal resets take so long... What device are you using?Hi Sagi Here is the device info: Mellanox Technologies MT27700 Family [ConnectX-4]quoted
quoted
# time nvme reset /dev/nvme0 real 1m16.133s user 0m0.000s sys 0m0.007sThere seems to be a spurious command timeout here, but maybe this is due to the fact that the queues take so long to connect and the target expires the keep-alive timer. Does this patch help?The issue still exists, let me know if you need more testing for it. :)
Hi Sagi ping, this issue still can be reproduced on the latest linux-block/for-next, do you have a chance to recheck it, thanks.
quoted
--diff --git a/drivers/nvme/target/fabrics-cmd.cb/drivers/nvme/target/fabrics-cmd.c index 7d0f3523fdab..f4a7db1ab3e5 100644--- a/drivers/nvme/target/fabrics-cmd.c +++ b/drivers/nvme/target/fabrics-cmd.c@@ -142,6 +142,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl*ctrl, struct nvmet_req *req) } } + /* + * Controller establishment flow may take some time, and the host may not + * send us keep-alive during this period, hence reset the + * traffic based keep-alive timer so we don't trigger a + * controller teardown as a result of a keep-alive expiration. + */ + ctrl->reset_tbkas = true; + return 0; err: --quoted
quoted
target: [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [ 944.900051] nvmet: ctrl 1 fatal error occurred! [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. client: [ 857.264029] nvme nvme0: resetting controller [ 864.115369] nvme nvme0: creating 40 I/O queues. [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. [ 868.001673] nvme nvme0: resetting controller [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 [ 935.438080] nvme nvme0: creating 40 I/O queues. [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues.-- Best Regards, Yi Zhang
-- Best Regards, Yi Zhang