Thread (9 messages) 9 messages, 3 authors, 2021-09-13

Re: [PATCH 1/1] nvme-rdma: destroy cm id before destroy qp to avoid use after free

From: liruozhu <hidden>
Date: 2021-09-10 02:51:06

Hi Max,

Yes, I did encounter this problem. I posted the detailed log in the cover letter of this patch.

I tested it with Huawei storage array, kill nvme target processes(we use self-developed user mode nvme target)
while inject PCIE AER errors on array HBA card repeatedly.

I’m not sure if there is a way to easily reproduce it using standard server with kernel target.
But I still think that the host side should try to avoid this possibility of crashing.

Thanks.

On 2021/9/10 1:32, Max Gurtovoy wrote:
On 9/9/2021 5:05 PM, Christoph Hellwig wrote:
quoted
This looks reasonable to me.  Sagi, Max: any comments?

On Mon, Sep 06, 2021 at 11:51:34AM +0800, Ruozhu Li wrote:
quoted
We should always destroy cm_id before destroy qp to avoid to get cma
event after qp was destroyed, which may lead to use after free.
Do you really encounter use-after-free ?

I think the code looks correct but from experience with RDMA-CM I 
would like to know which tests did you run with this code ?

interesting tests are: port toggling with/without IO, unload 
mlx5/other_rdma_hw drivers during connection establishment or during 
IO, reboot target, reboot target during connection establishment, 
reboot target during IO.

Thanks.

quoted
quoted
In RDMA connection establishment error flow, don't destroy qp in cm
event handler.Just report cm_error to upper level, qp will be destroy
in nvme_rdma_alloc_queue() after destroy cm id.

Signed-off-by: Ruozhu Li <redacted>
---
  drivers/nvme/host/rdma.c | 16 +++-------------
  1 file changed, 3 insertions(+), 13 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a68704e39084..042c594bc57e 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -656,8 +656,8 @@ static void nvme_rdma_free_queue(struct 
nvme_rdma_queue *queue)
      if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
          return;
  -    nvme_rdma_destroy_queue_ib(queue);
      rdma_destroy_id(queue->cm_id);
+    nvme_rdma_destroy_queue_ib(queue);
      mutex_destroy(&queue->queue_lock);
  }
  @@ -1815,14 +1815,10 @@ static int 
nvme_rdma_conn_established(struct nvme_rdma_queue *queue)
      for (i = 0; i < queue->queue_size; i++) {
          ret = nvme_rdma_post_recv(queue, &queue->rsp_ring[i]);
          if (ret)
-            goto out_destroy_queue_ib;
+            return ret;
      }
        return 0;
-
-out_destroy_queue_ib:
-    nvme_rdma_destroy_queue_ib(queue);
-    return ret;
  }
    static int nvme_rdma_conn_rejected(struct nvme_rdma_queue *queue,
@@ -1916,14 +1912,10 @@ static int nvme_rdma_route_resolved(struct 
nvme_rdma_queue *queue)
      if (ret) {
          dev_err(ctrl->ctrl.device,
              "rdma_connect_locked failed (%d).\n", ret);
-        goto out_destroy_queue_ib;
+        return ret;
      }
        return 0;
-
-out_destroy_queue_ib:
-    nvme_rdma_destroy_queue_ib(queue);
-    return ret;
  }
    static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
@@ -1954,8 +1946,6 @@ static int nvme_rdma_cm_handler(struct 
rdma_cm_id *cm_id,
      case RDMA_CM_EVENT_ROUTE_ERROR:
      case RDMA_CM_EVENT_CONNECT_ERROR:
      case RDMA_CM_EVENT_UNREACHABLE:
-        nvme_rdma_destroy_queue_ib(queue);
-        fallthrough;
      case RDMA_CM_EVENT_ADDR_ERROR:
          dev_dbg(queue->ctrl->ctrl.device,
              "CM error event %d\n", ev->event);
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
---end quoted text---
.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help