Thread (38 messages) 38 messages, 5 authors, 2022-03-21

Re: [bug report] NVMe/IB: reset_controller need more than 1min

From: Yi Zhang <hidden>
Date: 2021-12-11 03:02:23
Also in: linux-rdma

On Fri, Jun 25, 2021 at 12:14 AM Yi Zhang [off-list ref] wrote:
On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg [off-list ref] wrote:
quoted
quoted
Hello

Gentle ping here, this issue still exists on latest 5.13-rc7

# time nvme reset /dev/nvme0

real 0m12.636s
user 0m0.002s
sys 0m0.005s
# time nvme reset /dev/nvme0

real 0m12.641s
user 0m0.000s
sys 0m0.007s
Strange that even normal resets take so long...
What device are you using?
Hi Sagi

Here is the device info:
Mellanox Technologies MT27700 Family [ConnectX-4]
quoted
quoted
# time nvme reset /dev/nvme0

real 1m16.133s
user 0m0.000s
sys 0m0.007s
There seems to be a spurious command timeout here, but maybe this
is due to the fact that the queues take so long to connect and
the target expires the keep-alive timer.

Does this patch help?
The issue still exists, let me know if you need more testing for it. :)
Hi Sagi
ping, this issue still can be reproduced on the latest
linux-block/for-next, do you have a chance to recheck it, thanks.

quoted
--
diff --git a/drivers/nvme/target/fabrics-cmd.c
b/drivers/nvme/target/fabrics-cmd.c
index 7d0f3523fdab..f4a7db1ab3e5 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -142,6 +142,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl
*ctrl, struct nvmet_req *req)
                 }
         }

+       /*
+        * Controller establishment flow may take some time, and the
host may not
+        * send us keep-alive during this period, hence reset the
+        * traffic based keep-alive timer so we don't trigger a
+        * controller teardown as a result of a keep-alive expiration.
+        */
+       ctrl->reset_tbkas = true;
+
         return 0;

  err:
--
quoted
quoted
target:
[  934.306016] nvmet: creating controller 1 for subsystem testnqn for
NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432.
[  944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
[  944.900051] nvmet: ctrl 1 fatal error occurred!
[ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for
NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432.

client:
[  857.264029] nvme nvme0: resetting controller
[  864.115369] nvme nvme0: creating 40 I/O queues.
[  867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues.
[  868.001673] nvme nvme0: resetting controller
[  935.396789] nvme nvme0: I/O 9 QID 0 timeout
[  935.402036] nvme nvme0: Property Set error: 881, offset 0x14
[  935.438080] nvme nvme0: creating 40 I/O queues.
[  939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues.

--
Best Regards,
  Yi Zhang


--
Best Regards,
  Yi Zhang

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help