Re: [PATCH v3] nvme-tcp: Do not reset transport on data digest errors
From: Sagi Grimberg <sagi@grimberg.me>
Date: 2021-08-30 11:25:10
Also in:
lkml
On 8/26/21 1:21 AM, Daniel Wagner wrote:
quoted hunk ↗ jump to hunk
The spec says 7.4.6.1 Digest Error handling When a host detects a data digest error in a C2HData PDU, that host shall continue processing C2HData PDUs associated with the command and when the command processing has completed, if a successful status was returned by the controller, the host shall fail the command with a non-fatal transport error. Currently the transport is reseted when a data digest error is detected. Instead, when a digest error is detected, mark the final status as NVME_SC_DATA_XFER_ERROR and let the upper layer handle the error. In order to keep track of the final result maintain a status field in nvme_tcp_request object and use it to overwrite the completion queue status (which might be successful even though a digest error has been detected) when completing the request. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <redacted> --- The status member placed so that it fills up a hole in struct nvme_tcp_request: struct nvme_tcp_request { struct nvme_request req; /* 0 32 */ void * pdu; /* 32 8 */ struct nvme_tcp_queue * queue; /* 40 8 */ u32 data_len; /* 48 4 */ u32 pdu_len; /* 52 4 */ u32 pdu_sent; /* 56 4 */ u16 ttag; /* 60 2 */ u16 status; /* 62 2 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct list_head entry; /* 64 16 */ struct llist_node lentry; /* 80 8 */ __le32 ddgst; /* 88 4 */ /* XXX 4 bytes hole, try to pack */ struct bio * curr_bio; /* 96 8 */ struct iov_iter iter; /* 104 40 */ /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */ size_t offset; /* 144 8 */ size_t data_sent; /* 152 8 */ enum nvme_tcp_send_state state; /* 160 4 */ /* size: 168, cachelines: 3, members: 16 */ /* sum members: 160, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 40 bytes */ }; v3: - initialize req->status in nvme_tcp_setup_cmd_pdu() - add rb tag from Hannes v2: - https://lore.kernel.org/linux-nvme/20210825124259.28707-1-dwagner@suse.de/ (local) - moved 'status' from nvme_tcp_queue to nvme_tcp_request. v1: - https://lore.kernel.org/linux-nvme/20210805121541.77613-1-dwagner@suse.de/ (local) drivers/nvme/host/tcp.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 645025620154..29ef0f74f620 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c@@ -45,6 +45,7 @@ struct nvme_tcp_request { u32 pdu_len; u32 pdu_sent; u16 ttag; + u16 status; struct list_head entry; struct llist_node lentry; __le32 ddgst;@@ -485,7 +486,9 @@ static void nvme_tcp_error_recovery(struct nvme_ctrl *ctrl) static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue, struct nvme_completion *cqe) { + struct nvme_tcp_request *req; struct request *rq; + u16 status; rq = nvme_find_rq(nvme_tcp_tagset(queue), cqe->command_id); if (!rq) {@@ -496,7 +499,12 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue, return -EINVAL; } - if (!nvme_try_complete_req(rq, cqe->status, cqe->result)) + req = blk_mq_rq_to_pdu(rq); + status = req->status; + if (status == NVME_SC_SUCCESS) + status = cqe->status;
Maybe more intuitive to skip the local status variable? /* */ if (req->status == NVME_SC_SUCCESS) req->status = cqe->status; This way it is always consistent completing with req->status. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme