Thread (34 messages) 34 messages, 7 authors, 2021-10-20

Re: [PATCH 5/7] nvme/nvme-fabrics: introduce nvmf_error_recovery_work API

From: Hannes Reinecke <hare@suse.de>
Date: 2021-10-19 13:34:50

On 10/18/21 3:40 PM, Max Gurtovoy wrote:
quoted hunk ↗ jump to hunk
Error recovery work is duplicated in RDMA and TCP transports. Move this
logic to common code. For that, introduce 2 new ctrl ops to teardown IO
and admin queue.

Also update the RDMA/TCP transport drivers to use this API and remove
the duplicated code.

Reviewed-by: Israel Rukshin <redacted>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
  drivers/nvme/host/fabrics.c | 23 +++++++++++++++
  drivers/nvme/host/fabrics.h |  1 +
  drivers/nvme/host/nvme.h    |  4 +++
  drivers/nvme/host/rdma.c    | 56 ++++++++++++++++---------------------
  drivers/nvme/host/tcp.c     | 56 +++++++++++++++----------------------
  5 files changed, 75 insertions(+), 65 deletions(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 2edd086fa922..544195369c97 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -493,6 +493,29 @@ void nvmf_reconnect_or_remove(struct nvme_ctrl *ctrl)
  }
  EXPORT_SYMBOL_GPL(nvmf_reconnect_or_remove);
  
+void nvmf_error_recovery_work(struct work_struct *work)
+{
+	struct nvme_ctrl *ctrl = container_of(work,
+				struct nvme_ctrl, err_work);
+
+	nvme_stop_keep_alive(ctrl);
+	ctrl->ops->teardown_ctrl_io_queues(ctrl);
+	/* unquiesce to fail fast pending requests */
+	nvme_start_queues(ctrl);
+	ctrl->ops->teardown_ctrl_admin_queue(ctrl);
+	blk_mq_unquiesce_queue(ctrl->admin_q);
+
+	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
+		/* state change failure is ok if we started ctrl delete */
+		WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
+			     ctrl->state != NVME_CTRL_DELETING_NOIO);
+		return;
+	}
+
+	nvmf_reconnect_or_remove(ctrl);
+}
+EXPORT_SYMBOL_GPL(nvmf_error_recovery_work);
+
  void nvmf_error_recovery(struct nvme_ctrl *ctrl)
  {
  	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 3d8ec7133fc8..8655eff74ed0 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -190,6 +190,7 @@ int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size);
  bool nvmf_should_reconnect(struct nvme_ctrl *ctrl);
  void nvmf_reconnect_or_remove(struct nvme_ctrl *ctrl);
  void nvmf_error_recovery(struct nvme_ctrl *ctrl);
+void nvmf_error_recovery_work(struct work_struct *work);
  bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
  		struct nvmf_ctrl_options *opts);
  
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index f9e1ce93d61d..1573edf6e97f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -493,6 +493,10 @@ struct nvme_ctrl_ops {
  	void (*submit_async_event)(struct nvme_ctrl *ctrl);
  	void (*delete_ctrl)(struct nvme_ctrl *ctrl);
  	int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
+
+	/* Fabrics only */
+	void (*teardown_ctrl_io_queues)(struct nvme_ctrl *ctrl);
+	void (*teardown_ctrl_admin_queue)(struct nvme_ctrl *ctrl);
  };
  
  /*
As noted by Sagi, this abstraction really is awkward.
Why not have

void (*teardown_ctrl_queues)(struct nvme_ctrl *ctrl, int qid);

and let the callback figure out whether it's for the admin queue
(ie qid == 8) or the normal I/O queue?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help