Re: [Bug 215157] New: nvme delete-ns over nvme-tcp to nvmet-tcp error
From: Sagi Grimberg <sagi@grimberg.me>
Date: 2021-12-08 13:05:53
On 11/29/21 8:01 PM, Keith Busch wrote:
[adding Sagi & linux-nvme mailing list] On Sat, Nov 27, 2021 at 11:24:57PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:<snip>quoted
Reproduction: modprobe nvmet modprobe nvmet-tcp truncate --size 1G /tmp/foo1.img DEV=$(losetup --find /tmp/foo1.img) mkdir /sys/kernel/config/nvmet/subsystems/testsubsystem cd /sys/kernel/config/nvmet/subsystems/testsubsystem echo 1 > attr_allow_any_host echo "${DEV}" > device_path echo 1 > enable mkdir /sys/kernel/config/nvmet/ports/1 cd /sys/kernel/config/nvmet/ports/1 echo 4420 > addr_trsvcid echo 127.0.0.1 > addr_traddr echo tcp > addr_trtype echo ipv4 > addr_adrfam ln -s /sys/kernel/config/nvmet/subsystems/testsubsystem/ /sys/kernel/config/nvmet/ports/1/subsystems/testsubsystem modprobe nvme-tcp nvme connect -t tcp -n testsubsystem -a 127.0.0.1 -s 4420 nvme list nvme delete-ns -n 1 /dev/nvmeX Eventually will finish with Interuptted, but the device will be unusable after this. The nvme-tcp shows issues in dmesg: [10638.139221] nvmet: creating nvm controller 1 for subsystem testsubsystem for NQN nqn.2014-08.org.nvmexpress:uuid:88a968b7-a5d7-4549-aaa8-a31f1d25aa4b. [10638.139677] nvme nvme2: creating 32 I/O queues. [10638.141556] nvme nvme2: mapped 32/0/0 default/read/poll queues. [10638.145884] nvme nvme2: new ctrl: NQN "testsubsystem", addr 127.0.0.1:4420 [10678.676734] nvmet_tcp: failed cmd 00000000fdc07f80 id 4100 opcode 13, data_len: 0 [10684.129665] nvme nvme2: queue 0: timeout request 0x0 type 4The target sees the opcode 13 (0xd) for the delete namespace management command and correcty decides it is unsupported, so the request initialization fails. Here's where it looks like the target code goes wrong: nvmet_tcp_try_recv_one nvmet_tcp_done_recv_pdu nvmet_req_init __nvmet_req_complete nvmet_tcp_queue_response <- cmd is added to the queue resp_list nvmet_tcp_handle_req_failure() -> rcv_state set to NVMET_TCP_RECV_DATA, flags set to NVMET_TCP_F_INIT_FAILED
nvmet_tcp_handle_req_failure will not do anything if the command does not have any incapsule data (which is the case here).
nvmet_tcp_try_recv_data
nvmet_tcp_execute_request <- sees NVMET_TCP_F_INIT_FAILED
nvmet_tcp_queue_response <- cmd is added to the queue resp_list a 2nd timeThere is a subtle case here as if this is a command that we are queueing back is the current command being processed (which is the case if we want to fail a command that we still need to accept the incapsule data) we don't attach it to the resp_list. I do agree this is convoluted and needs a cleaner failure handling...
That can't be right: the cmd shouldn't be added to the response list twice. Sagi, am I missing something here?quoted
[10684.129673] nvme nvme2: starting error recovery [10684.131074] nvme nvme2: failed nvme_keep_alive_end_io error=10 [10684.137695] nvme nvme2: Reconnecting in 10 seconds... [10691.809468] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [10691.809519] nvmet: ctrl 1 fatal error occurred! [10756.832004] nvme nvme2: queue 0: timeout request 0x0 type 4 [10756.832137] nvme nvme2: Connect command failed, error wo/DNR bit: 881 [10756.832199] nvme nvme2: failed to connect queue: 0 ret=881 [10756.832222] nvme nvme2: Failed reconnect attempt 1 [10756.832225] nvme nvme2: Reconnecting in 10 seconds... [10814.174824] INFO: task kworker/11:2:11273 blocked for more than 122 seconds. [10814.174832] Tainted: G E 5.16.0-rc2 #1 [10814.174834] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10814.174835] task:kworker/11:2 state:D stack: 0 pid:11273 ppid: 2 flags:0x00004000 [10814.174841] Workqueue: events nvmet_tcp_release_queue_work [nvmet_tcp] [10814.174850] Call Trace: [10814.174852] <TASK> [10814.174855] __schedule+0x23f/0xe50 [10814.174861] ? update_load_avg+0x77/0x4d0 [10814.174866] schedule+0x50/0xd0 [10814.174869] schedule_timeout+0x99/0xd0 [10814.174873] ? raw_spin_rq_lock_nested+0x19/0x80 [10814.174876] wait_for_completion+0x74/0xc0 [10814.174880] __flush_work.isra.0+0x14b/0x210 [10814.174884] ? flush_workqueue_prep_pwqs+0x110/0x110 [10814.174887] nvmet_tcp_release_queue_work+0xc3/0x330 [nvmet_tcp] [10814.174892] process_one_work+0x1ad/0x310 [10814.174895] worker_thread+0x48/0x3d0 [10814.174899] ? process_one_work+0x310/0x310 [10814.174901] kthread+0x155/0x180 [10814.174904] ? set_kthread_struct+0x30/0x30 [10814.174907] ret_from_fork+0x1f/0x30 [10814.174912] </TASK> [10830.558291] nvme nvme2: queue 0: timeout request 0x0 type 4 [10830.558422] nvme nvme2: Connect command failed, error wo/DNR bit: 881 [10830.558484] nvme nvme2: failed to connect queue: 0 ret=881 [10830.558507] nvme nvme2: Failed reconnect attempt 2 [10830.558510] nvme nvme2: Reconnecting in 10 seconds... [10904.284728] nvme nvme2: queue 0: timeout request 0x0 type 4 [10904.284757] nvme nvme2: Connect command failed, error wo/DNR bit: 881 [10904.284818] nvme nvme2: failed to connect queue: 0 ret=881 [10904.284842] nvme nvme2: Failed reconnect attempt 3 [10904.284845] nvme nvme2: Reconnecting in 10 seconds...
The issue here seems to be that the controller fails the command but it still times out for the host - which suggest that the error response is not actually queued.