Re: [PATCH 1/1] nvme-rdma: destroy cm id before destroy qp to avoid use after free
From: liruozhu <hidden>
Date: 2021-09-10 02:51:06
Hi Max, Yes, I did encounter this problem. I posted the detailed log in the cover letter of this patch. I tested it with Huawei storage array, kill nvme target processes(we use self-developed user mode nvme target) while inject PCIE AER errors on array HBA card repeatedly. I’m not sure if there is a way to easily reproduce it using standard server with kernel target. But I still think that the host side should try to avoid this possibility of crashing. Thanks. On 2021/9/10 1:32, Max Gurtovoy wrote:
On 9/9/2021 5:05 PM, Christoph Hellwig wrote:quoted
This looks reasonable to me. Sagi, Max: any comments? On Mon, Sep 06, 2021 at 11:51:34AM +0800, Ruozhu Li wrote:quoted
We should always destroy cm_id before destroy qp to avoid to get cma event after qp was destroyed, which may lead to use after free.Do you really encounter use-after-free ? I think the code looks correct but from experience with RDMA-CM I would like to know which tests did you run with this code ? interesting tests are: port toggling with/without IO, unload mlx5/other_rdma_hw drivers during connection establishment or during IO, reboot target, reboot target during connection establishment, reboot target during IO. Thanks.quoted
quoted
In RDMA connection establishment error flow, don't destroy qp in cm event handler.Just report cm_error to upper level, qp will be destroy in nvme_rdma_alloc_queue() after destroy cm id. Signed-off-by: Ruozhu Li <redacted> --- drivers/nvme/host/rdma.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-)diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index a68704e39084..042c594bc57e 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c@@ -656,8 +656,8 @@ static void nvme_rdma_free_queue(structnvme_rdma_queue *queue) if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags)) return; - nvme_rdma_destroy_queue_ib(queue); rdma_destroy_id(queue->cm_id); + nvme_rdma_destroy_queue_ib(queue); mutex_destroy(&queue->queue_lock); } @@ -1815,14 +1815,10 @@ static int nvme_rdma_conn_established(struct nvme_rdma_queue *queue) for (i = 0; i < queue->queue_size; i++) { ret = nvme_rdma_post_recv(queue, &queue->rsp_ring[i]); if (ret) - goto out_destroy_queue_ib; + return ret; } return 0; - -out_destroy_queue_ib: - nvme_rdma_destroy_queue_ib(queue); - return ret; } static int nvme_rdma_conn_rejected(struct nvme_rdma_queue *queue,@@ -1916,14 +1912,10 @@ static int nvme_rdma_route_resolved(structnvme_rdma_queue *queue) if (ret) { dev_err(ctrl->ctrl.device, "rdma_connect_locked failed (%d).\n", ret); - goto out_destroy_queue_ib; + return ret; } return 0; - -out_destroy_queue_ib: - nvme_rdma_destroy_queue_ib(queue); - return ret; } static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,@@ -1954,8 +1946,6 @@ static int nvme_rdma_cm_handler(structrdma_cm_id *cm_id, case RDMA_CM_EVENT_ROUTE_ERROR: case RDMA_CM_EVENT_CONNECT_ERROR: case RDMA_CM_EVENT_UNREACHABLE: - nvme_rdma_destroy_queue_ib(queue); - fallthrough; case RDMA_CM_EVENT_ADDR_ERROR: dev_dbg(queue->ctrl->ctrl.device, "CM error event %d\n", ev->event); -- 2.16.4 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme---end quoted text---.
_______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme