Re: [Bug 215157] New: nvme delete-ns over nvme-tcp to nvmet-tcp error

From: Sagi Grimberg <sagi@grimberg.me>
Date: 2021-12-08 13:05:53


On 11/29/21 8:01 PM, Keith Busch wrote:

[adding Sagi & linux-nvme mailing list]

On Sat, Nov 27, 2021 at 11:24:57PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:

quoted

https://bugzilla.kernel.org/show_bug.cgi?id=215157

<snip>

quoted

Reproduction:


modprobe nvmet
modprobe nvmet-tcp
truncate --size 1G /tmp/foo1.img
DEV=$(losetup --find /tmp/foo1.img)

mkdir /sys/kernel/config/nvmet/subsystems/testsubsystem
cd /sys/kernel/config/nvmet/subsystems/testsubsystem
echo 1 > attr_allow_any_host
echo "${DEV}" > device_path
echo 1 > enable

mkdir /sys/kernel/config/nvmet/ports/1
cd /sys/kernel/config/nvmet/ports/1
echo 4420 > addr_trsvcid
echo 127.0.0.1 > addr_traddr
echo tcp > addr_trtype
echo ipv4 > addr_adrfam
ln -s /sys/kernel/config/nvmet/subsystems/testsubsystem/
/sys/kernel/config/nvmet/ports/1/subsystems/testsubsystem

modprobe nvme-tcp
nvme connect -t tcp -n testsubsystem -a 127.0.0.1 -s 4420
nvme list
nvme delete-ns -n 1 /dev/nvmeX


Eventually will finish with Interuptted, but the device will be unusable after
this.

The nvme-tcp shows issues in dmesg:

[10638.139221] nvmet: creating nvm controller 1 for subsystem testsubsystem for
NQN nqn.2014-08.org.nvmexpress:uuid:88a968b7-a5d7-4549-aaa8-a31f1d25aa4b.
[10638.139677] nvme nvme2: creating 32 I/O queues.
[10638.141556] nvme nvme2: mapped 32/0/0 default/read/poll queues.
[10638.145884] nvme nvme2: new ctrl: NQN "testsubsystem", addr 127.0.0.1:4420
[10678.676734] nvmet_tcp: failed cmd 00000000fdc07f80 id 4100 opcode 13, data_len: 0
[10684.129665] nvme nvme2: queue 0: timeout request 0x0 type 4

The target sees the opcode 13 (0xd) for the delete namespace management
command and correcty decides it is unsupported, so the request
initialization fails. Here's where it looks like the target code goes
wrong:

nvmet_tcp_try_recv_one
   nvmet_tcp_done_recv_pdu
     nvmet_req_init
       __nvmet_req_complete
         nvmet_tcp_queue_response <- cmd is added to the queue resp_list
     nvmet_tcp_handle_req_failure() -> rcv_state set to NVMET_TCP_RECV_DATA, flags set to NVMET_TCP_F_INIT_FAILED

nvmet_tcp_handle_req_failure will not do anything if the command does
not have any incapsule data (which is the case here).

   nvmet_tcp_try_recv_data
     nvmet_tcp_execute_request <- sees NVMET_TCP_F_INIT_FAILED
       nvmet_tcp_queue_response <- cmd is added to the queue resp_list a 2nd time

There is a subtle case here as if this is a command that we are queueing
back is the current command being processed (which is the case if we
want to fail a command that we still need to accept the incapsule
data) we don't attach it to the resp_list. I do agree this is convoluted
and needs a cleaner failure handling...

That can't be right: the cmd shouldn't be added to the response list
twice.

Sagi, am I missing something here?

quoted

[10684.129673] nvme nvme2: starting error recovery
[10684.131074] nvme nvme2: failed nvme_keep_alive_end_io error=10
[10684.137695] nvme nvme2: Reconnecting in 10 seconds...
[10691.809468] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
[10691.809519] nvmet: ctrl 1 fatal error occurred!
[10756.832004] nvme nvme2: queue 0: timeout request 0x0 type 4
[10756.832137] nvme nvme2: Connect command failed, error wo/DNR bit: 881
[10756.832199] nvme nvme2: failed to connect queue: 0 ret=881
[10756.832222] nvme nvme2: Failed reconnect attempt 1
[10756.832225] nvme nvme2: Reconnecting in 10 seconds...
[10814.174824] INFO: task kworker/11:2:11273 blocked for more than 122 seconds.
[10814.174832]       Tainted: G            E     5.16.0-rc2 #1
[10814.174834] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[10814.174835] task:kworker/11:2    state:D stack:    0 pid:11273 ppid:     2
flags:0x00004000
[10814.174841] Workqueue: events nvmet_tcp_release_queue_work [nvmet_tcp]
[10814.174850] Call Trace:
[10814.174852]  <TASK>
[10814.174855]  __schedule+0x23f/0xe50
[10814.174861]  ? update_load_avg+0x77/0x4d0
[10814.174866]  schedule+0x50/0xd0
[10814.174869]  schedule_timeout+0x99/0xd0
[10814.174873]  ? raw_spin_rq_lock_nested+0x19/0x80
[10814.174876]  wait_for_completion+0x74/0xc0
[10814.174880]  __flush_work.isra.0+0x14b/0x210
[10814.174884]  ? flush_workqueue_prep_pwqs+0x110/0x110
[10814.174887]  nvmet_tcp_release_queue_work+0xc3/0x330 [nvmet_tcp]
[10814.174892]  process_one_work+0x1ad/0x310
[10814.174895]  worker_thread+0x48/0x3d0
[10814.174899]  ? process_one_work+0x310/0x310
[10814.174901]  kthread+0x155/0x180
[10814.174904]  ? set_kthread_struct+0x30/0x30
[10814.174907]  ret_from_fork+0x1f/0x30
[10814.174912]  </TASK>
[10830.558291] nvme nvme2: queue 0: timeout request 0x0 type 4
[10830.558422] nvme nvme2: Connect command failed, error wo/DNR bit: 881
[10830.558484] nvme nvme2: failed to connect queue: 0 ret=881
[10830.558507] nvme nvme2: Failed reconnect attempt 2
[10830.558510] nvme nvme2: Reconnecting in 10 seconds...
[10904.284728] nvme nvme2: queue 0: timeout request 0x0 type 4
[10904.284757] nvme nvme2: Connect command failed, error wo/DNR bit: 881
[10904.284818] nvme nvme2: failed to connect queue: 0 ret=881
[10904.284842] nvme nvme2: Failed reconnect attempt 3
[10904.284845] nvme nvme2: Reconnecting in 10 seconds...

The issue here seems to be that the controller fails the command but it
still times out for the host - which suggest that the error response is
not actually queued.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help