RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
From: Michael Kelley <hidden>
Date: 2021-08-20 15:40:28
Also in:
linux-hyperv, linux-iommu, linux-scsi, lkml, netdev, xen-devel
From: hch@lst.de <hch@lst.de> Sent: Thursday, August 19, 2021 9:33 PM
On Thu, Aug 19, 2021 at 06:17:40PM +0000, Michael Kelley wrote:quoted
quoted
@@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + cmd_request->dma_range = kcalloc(hvpg_count, + sizeof(*cmd_request->dma_range), + GFP_ATOMIC);With this patch, it appears that storvsc_queuecommand() is always doing bounce buffering, even when running in a non-isolated VM. The dma_range is always allocated, and the inner loop below does the dma mapping for every I/O page. The corresponding code in storvsc_on_channel_callback() that does the dma unmap allows for the dma_range to be NULL, but that never happens.Maybe I'm missing something in the hyperv code, but I don't think dma_map_page would bounce buffer for the non-isolated case. It will just return the physical address.
OK, right. In the isolated VM case, the swiotlb is in force mode and will do bounce buffering. In the non-isolated case, dma_map_page_attrs() -> dma_direct_map_page() does a lot of checking but eventually just returns the physical address. As this patch is currently coded, it adds a fair amount of overhead here in storvsc_queuecommand(), plus the overhead of the dma mapping function deciding to use the identity mapping. But if dma_map_sg() is used and the code is simplified a bit, the overhead will be less in general and will be per sgl entry instead of per page.
quoted
quoted
+ if (offset_in_hvpg) { + payload->range.offset = dma & ~HV_HYP_PAGE_MASK; + offset_in_hvpg = 0; + }I'm not clear on why payload->range.offset needs to be set again. Even after the dma mapping is done, doesn't the offset in the first page have to be the same? If it wasn't the same, Hyper-V wouldn't be able to process the PFN list correctly. In fact, couldn't the above code just always set offset_in_hvpg = 0?Careful. DMA mapping is supposed to keep the offset in the page, but for that the DMA mapping code needs to know what the device considers a "page". For that the driver needs to set the min_align_mask field in struct device_dma_parameters.
I see that the swiotlb code gets and uses the min_align_mask field. But the NVME driver is the only driver that ever sets it, so the value is zero in all other cases. Does swiotlb just use PAGE_SIZE in that that case? I couldn't tell from a quick glance at the swiotlb code.
quoted
The whole approach here is to do dma remapping on each individual page of the I/O buffer. But wouldn't it be possible to use dma_map_sg() to map each scatterlist entry as a unit? Each scatterlist entry describes a range of physically contiguous memory. After dma_map_sg(), the resulting dma address must also refer to a physically contiguous range in the swiotlb bounce buffer memory. So at the top of the "for" loop over the scatterlist entries, do dma_map_sg() if we're in an isolated VM. Then compute the hvpfn value based on the dma address instead of sg_page(). But everything else is the same, and the inner loop for populating the pfn_arry is unmodified. Furthermore, the dma_range array that you've added is not needed, since scatterlist entries already have a dma_address field for saving the mapped address, and dma_unmap_sg() uses that field.Yes, I think dma_map_sg is the right thing to use here, probably even for the non-isolated case so that we can get the hv drivers out of their little corner and into being more like a normal kernel driver. That is, use the scsi_dma_map/scsi_dma_unmap helpers, and then iterate over the dma addresses one page at a time using for_each_sg_dma_page.
Doing some broader revisions to the Hyper-V storvsc driver is up next on my to-do list. Rather than significantly modifying the non-isolated case in this patch set, I'd suggest factoring it into my broader revisions. Michael