Thread (3 messages) 3 messages, 2 authors, 2021-01-19

Re: [PATCH] nvme: fix handling mapping failure

From: Christoph Hellwig <hch@lst.de>
Date: 2021-01-19 20:14:55
Also in: linux-nvme, lkml
Subsystem: nvm express driver, the rest · Maintainers: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, Linus Torvalds

On Tue, Jan 19, 2021 at 09:53:36AM -0800, Marc Orr wrote:
This patch ensures that when `nvme_map_data()` fails to map the
addresses in a scatter/gather list:

* The addresses are not incorrectly unmapped. The underlying
scatter/gather code unmaps the addresses after detecting a failure.
Thus, unmapping them again in the driver is a bug.
* The DMA pool allocations are not deallocated when they were never
allocated.

The bug that motivated this patch was the following sequence, which
occurred within the NVMe driver, with the kernel flag `swiotlb=force`.

* NVMe driver calls dma_direct_map_sg()
* dma_direct_map_sg() fails part way through the scatter gather/list
* dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
  succeeded.
* NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
  double unmap, which is a bug.

Before this patch, I observed intermittent application- and VM-level
failures when running a benchmark, fio, in an AMD SEV guest. This patch
resolves the failures.
I think the right way to fix this is to just do a proper unwind insted
of calling a catchall function.  Can you try this patch?
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 25456d02eddb8c..47d7075053b6b2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -842,7 +842,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
 	iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
 	if (!iod->nents)
-		goto out;
+		goto out_free_sg;
 
 	if (is_pci_p2pdma_page(sg_page(iod->sg)))
 		nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
@@ -851,16 +851,25 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
 					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
-		goto out;
+		goto out_free_sg;
 
 	iod->use_sgl = nvme_pci_use_sgls(dev, req);
 	if (iod->use_sgl)
 		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
 	else
 		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
-out:
 	if (ret != BLK_STS_OK)
-		nvme_unmap_data(dev, req);
+		goto out_dma_unmap;
+	return BLK_STS_OK;
+
+out_dma_unmap:
+	if (is_pci_p2pdma_page(sg_page(iod->sg)))
+		pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
+				    rq_dma_dir(req));
+	else
+		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+out_free_sg:
+	mempool_free(iod->sg, dev->iod_mempool);
 	return ret;
 }
 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help