Re: [PATCH 2/2] nvmet: fix a race condition between release_queue and io_work
From: Sagi Grimberg <sagi@grimberg.me>
Date: 2021-11-14 10:28:57
Subsystem:
nvm express target driver, the rest · Maintainers:
Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, Linus Torvalds
On 11/12/21 12:54 PM, Maurizio Lombardi wrote:
quoted hunk ↗ jump to hunk
Hi Sagi, On Thu, Nov 04, 2021 at 02:59:53PM +0200, Sagi Grimberg wrote:quoted
Right, after the call to cancel_work_sync we will know that io_work is not running. Note that it can run as a result of a backend completion but that is ok and we do want to let it run and return completion to the host, but the socket should already be shut down for recv, so we cannot get any other byte from the network.I did some tests and I found out that kernel_recvmsg() sometimes returns data even if the socket has been already shut down (maybe it's data it received before the call to kernel_sock_shutdown() and waiting in some internal buffer?). So when nvmet_sq_destroy() triggered io_work, recvmsg() still returned data and the kernel crashed again despite the socket was closed. Therefore, I think that after we shut down the socket we should let io_work run and requeue itself until it finishes its job and no more data is returned by recvmsg(), one way to achieve this is to repeatedly call flush_work() until it returns false. Right now I am testing the patch below and it works perfectly. Note that when the socket is closed recvmsg() might return 0, nvmet_tcp_try_recv_data() should return -EAGAIN in that case otherwise we end up in an infinite loop (io_work will countinously requeue itself).diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 2f03a94725ae..7b441071c6b9 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c@@ -1139,8 +1139,10 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue) while (msg_data_left(&cmd->recv_msg)) { ret = sock_recvmsg(cmd->queue->sock, &cmd->recv_msg, cmd->recv_msg.msg_flags); - if (ret <= 0) + if (ret < 0) return ret; + else if (ret == 0) + return -EAGAIN; cmd->pdu_recv += ret; cmd->rbytes_done += ret;@@ -1446,8 +1450,10 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w) list_del_init(&queue->queue_list); mutex_unlock(&nvmet_tcp_queue_mutex); + kernel_sock_shutdown(queue->sock, SHUT_RD); + nvmet_tcp_restore_socket_callbacks(queue);
I think you should shutdown the socket after you restore the socket callbacks, and after that cancel_work_sync to deny a self-requeue.
- flush_work(&queue->io_work); + while (flush_work(&queue->io_work)); nvmet_tcp_uninit_data_in_cmds(queue); nvmet_sq_destroy(&queue->nvme_sq);
How about something like this: --
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 6eb0b3153477..6425375faf5b 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c@@ -1436,7 +1436,9 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w)
mutex_unlock(&nvmet_tcp_queue_mutex);
nvmet_tcp_restore_socket_callbacks(queue);
- flush_work(&queue->io_work);
+ /* stop accepting incoming data */
+ kernel_sock_shutdown(queue->sock, SHUT_RD);
+ cancel_work_sync(&queue->io_work);
nvmet_tcp_uninit_data_in_cmds(queue);
nvmet_sq_destroy(&queue->nvme_sq);
--