Re: [PATCH 2/2] nvmet: fix a race condition between release_queue and io_work
From: Sagi Grimberg <sagi@grimberg.me>
Date: 2021-11-15 10:19:06
quoted hunk ↗ jump to hunk
quoted
--diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 6eb0b3153477..65210dec3f1a 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c@@ -1436,6 +1436,8 @@ static void nvmet_tcp_release_queue_work(structwork_struct *w) mutex_unlock(&nvmet_tcp_queue_mutex); nvmet_tcp_restore_socket_callbacks(queue); + /* stop accepting incoming data */ + queue->rcv_state = NVMET_TCP_RECV_ERR; flush_work(&queue->io_work); nvmet_tcp_uninit_data_in_cmds(queue); --Ok I can repeat the test, but you probably want to do this instead:diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index fb72e2d67fd5..d21b525fd4cb 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c@@ -1450,7 +1450,9 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w) mutex_unlock(&nvmet_tcp_queue_mutex); nvmet_tcp_restore_socket_callbacks(queue); - flush_work(&queue->io_work); + cancel_work_sync(&queue->io_work); + /* stop accepting incoming data */ + queue->rcv_state = NVMET_TCP_RECV_ERR; nvmet_tcp_uninit_data_in_cmds(queue); nvmet_sq_destroy(&queue->nvme_sq);If you don't perform a cancel_work_sync() you may race against a running io_work thread that may overwrite rcv_state with some other value.
Yes, that makes sense.