Re: [PATCH 5/7] nvme/nvme-fabrics: introduce nvmf_error_recovery_work API
From: Hannes Reinecke <hare@suse.de>
Date: 2021-10-19 13:34:50
On 10/18/21 3:40 PM, Max Gurtovoy wrote:
quoted hunk ↗ jump to hunk
Error recovery work is duplicated in RDMA and TCP transports. Move this logic to common code. For that, introduce 2 new ctrl ops to teardown IO and admin queue. Also update the RDMA/TCP transport drivers to use this API and remove the duplicated code. Reviewed-by: Israel Rukshin <redacted> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> --- drivers/nvme/host/fabrics.c | 23 +++++++++++++++ drivers/nvme/host/fabrics.h | 1 + drivers/nvme/host/nvme.h | 4 +++ drivers/nvme/host/rdma.c | 56 ++++++++++++++++--------------------- drivers/nvme/host/tcp.c | 56 +++++++++++++++---------------------- 5 files changed, 75 insertions(+), 65 deletions(-)diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 2edd086fa922..544195369c97 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c@@ -493,6 +493,29 @@ void nvmf_reconnect_or_remove(struct nvme_ctrl *ctrl) } EXPORT_SYMBOL_GPL(nvmf_reconnect_or_remove); +void nvmf_error_recovery_work(struct work_struct *work) +{ + struct nvme_ctrl *ctrl = container_of(work, + struct nvme_ctrl, err_work); + + nvme_stop_keep_alive(ctrl); + ctrl->ops->teardown_ctrl_io_queues(ctrl); + /* unquiesce to fail fast pending requests */ + nvme_start_queues(ctrl); + ctrl->ops->teardown_ctrl_admin_queue(ctrl); + blk_mq_unquiesce_queue(ctrl->admin_q); + + if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) { + /* state change failure is ok if we started ctrl delete */ + WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING && + ctrl->state != NVME_CTRL_DELETING_NOIO); + return; + } + + nvmf_reconnect_or_remove(ctrl); +} +EXPORT_SYMBOL_GPL(nvmf_error_recovery_work); + void nvmf_error_recovery(struct nvme_ctrl *ctrl) { if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h index 3d8ec7133fc8..8655eff74ed0 100644 --- a/drivers/nvme/host/fabrics.h +++ b/drivers/nvme/host/fabrics.h@@ -190,6 +190,7 @@ int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size); bool nvmf_should_reconnect(struct nvme_ctrl *ctrl); void nvmf_reconnect_or_remove(struct nvme_ctrl *ctrl); void nvmf_error_recovery(struct nvme_ctrl *ctrl); +void nvmf_error_recovery_work(struct work_struct *work); bool nvmf_ip_options_match(struct nvme_ctrl *ctrl, struct nvmf_ctrl_options *opts);diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index f9e1ce93d61d..1573edf6e97f 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h@@ -493,6 +493,10 @@ struct nvme_ctrl_ops { void (*submit_async_event)(struct nvme_ctrl *ctrl); void (*delete_ctrl)(struct nvme_ctrl *ctrl); int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size); + + /* Fabrics only */ + void (*teardown_ctrl_io_queues)(struct nvme_ctrl *ctrl); + void (*teardown_ctrl_admin_queue)(struct nvme_ctrl *ctrl); }; /*
As noted by Sagi, this abstraction really is awkward. Why not have void (*teardown_ctrl_queues)(struct nvme_ctrl *ctrl, int qid); and let the callback figure out whether it's for the admin queue (ie qid == 8) or the normal I/O queue? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer