Thread (67 messages) 67 messages, 6 authors, 22h ago

Re: [PATCH v6 00/20] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths

From: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Date: 2026-06-29 06:47:49
Also in: linux-arm-kernel, linux-coco, linux-iommu, linux-s390, lkml

Jason Gunthorpe [off-list ref] writes:
On Fri, Jun 19, 2026 at 02:36:19PM +0100, Aneesh Kumar K.V wrote:
quoted
quoted
quoted
Agreed. If the device can do encrypted DMA and requires bouncing, it
should bounce through encrypted pools. We don't support encrypted pools
now and that means, we mark the option ("mem_encrypt=on iommu=pt
swiotlb=force") not supported for now? 
?? if you don't have a CC system then the swiotlb is "encrypted"
meaning ordinary struct page system memory.

The hypervisor should not be triggering any CC special stuff here, it
is not a CC guest.

Agree we don't need to worry about swiotlb=force with a trusted device
in the GUEST for now, but it should be something to fix eventually.
If i understand this correctly, the setup Alexey is referring to here is
bare metal system with memory encryption enabled and dma address doesn't
need C bit cleared because it is handled in iommu.
This is how I understand it too, if the iommu is turned on then it can
take the high PA with the C bit set and map it to an IOVA that matches
the device's dma limit.
quoted
( I consider this as memory encryption that is handled
transparently, device can access any address because that encryption
details are now managed by iommu).
Compared to the guest side there are some important host side differences:

 - On the host the iommu can fix it because this is only a matter of
   IOVA range not access control. On a guest even a IOMMU cannot
   permit access to private memory
 - On the host the state of the device is driven by the dma limit
   which is not set until after the driver probes. On guest the state is
   set by the tsm and device security level before the driver
   probes
 - Both flows end up using pgprot_decrypted and set_memory_decrypted()
   to create their special pools, but for completely different
   reasons.
 - The memory coming from the special swiotlb pool must NOT be used by
   a trusted device on a CC guest, while there is no problem for any
   device to use it on the host.
Agreed.
quoted
Thinking about this more, I guess we should mark the swiotlb as
cc_shared only with  CC_ATTR_GUEST_MEM_ENCRYPT instead of
CC_ATTR_MEM_ENCRYPT as we have below.
The name cc_shared should be used for GUEST scenarios only.

I guess there is some merit in keeping swiotlb using "decrypted" to
mean it usinig pgprot_decrypted and set_memory_decyped() which AMD
gives meaning to on both host and guest.
Are you suggesting to change the struct io_tlb_mem::cc_shared back to
struct io_tlb_mem::unencrypted?. If we want to split cc_shared and
unencrypted as two flags, I think we will add quiet a lot of code
duplication.
IDK what AMD should do on the host by default. I guess it should setup
a swiotlb pool of low dma addrs "unencrypted", but not "cc_shared"?
If by low DMA address you mean using an address with the C-bit
cleared. Currently the SME code uses force_dma_unencrypted() as the hook to
determine whether the C-bit needs to be cleared. Therefore,
force_dma_unencrypted(dev) must be true to use such a pool.

The current code already does this and uses the swiotlb pool correctly
on SME. The challenge arises when we want to force SWIOTLB
bouncing even for devices that can handle encrypted DMA addresses (more
on that below). For such a config force_dma_uencrypted(dev) will return
false and swiotlb will be marked cc_shared/decrypted = true; This trip
the new check we added.

	/* swiotlb pool is incorrect for this device */
	if (unlikely(mem->cc_shared != force_dma_unencrypted(dev)))
		return (phys_addr_t)DMA_MAPPING_ERROR;

We can also do

	if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
		/* swiotlb pool is incorrect for this device */
		if (unlikely(mem->cc_shared != force_dma_unencrypted(dev)))
			return (phys_addr_t)DMA_MAPPING_ERROR;

		/* Force attrs to match the kind of memory in the pool */
		if (mem->cc_shared)
			*attrs |= DMA_ATTR_CC_SHARED;
		else
			*attrs &= ~DMA_ATTR_CC_SHARED;
	} else {
		/*
		 * Host memory encryption where device requires an
		 * unencrypted dma_addr_t due to dma mask limit
    		 */
		if (force_dma_unencrypted(dev))
			*attrs |= DMA_ATTR_CC_SHARED;
		else
			*attrs &= ~DMA_ATTR_CC_SHARED;
	}


Here I see value in having DMA_ATTR_UNENCRYPTED. The question is do we
need to split this into two flags and introduce the resulting code
duplication.
But if we are operating on the host then this pool is not limited to
only T=0 devices, every device can "safely" use it. (ignoring this
destroys the security memory encryption on bare metal was supposed to
provide)
quoted
Now we have the case of host memory encryption where the C-bit needs to
be cleared in dma_addr_t. That requires special handling in the kernel, and
I believe we need to mark swiotlb as unencrypted in this configuration.
I think we need to split the two things up, they have different
behaviors and need different flags and labels to make it all work
right.
quoted
I am still not clear whether there is a config option or runtime check
we can use to identify this case.
The dma api has to detect, after the driver sets the dma limit, that
none of system memory is usable when:
 - The direct path is being used
 - phys to dma for 0 is outside the dma limit

Then it should assume the arch has setup a swiotlb pool for it to use
to fix the high memory problem.

Similar hackery would be needed in the dma alloc path to know that
decrypted can be used to fix the high memory problem like for GUEST.

I guess some 'dev_cannot_reach_memory(dev)' sort of test in a
few key places? Setup with a static branch to be a nop on everything
but AMD, compiled out on every other arch.
If we are not able to reach the memory because of the memory encryption
bit, then isn't dev_cannot_reach_memory(dev) the same as
force_dma_unencrypted(dev)? If so, that is how it is already done.

I am wondering whether we can keep this simpler by ignoring the
swiotlb=force kernel parameter and keeping cc_shared as it is, even
though that can be confusing when looking at SME.

The three configurations we need to consider here are:

1) SEV-SNP guest
2) SME host with iommu=translated
3) SME host with iommu=passthrough

IIUC, all of the above work with the current code because we mark the
swiotlb as cc_shared/decrypted when CC_ATTR_MEM_ENCRYPT is set (i.e.,
this applies to an SME host as well).

The challenge arises when the user forces swiotlb bouncing with the
swiotlb=force command-line option. At that point, all devices, including
those whose DMA mask can handle encrypted DMA addresses, are forced to
use SWIOTLB. That becomes a problem because SWIOTLB is marked as
decrypted by default.

How about something like the following?

x86/dma: Disable forced SWIOTLB bouncing for SME IOMMU passthrough

With host memory encryption and IOMMU passthrough, DMA address handling
depends on whether a device can address the C-bit. Devices that cannot
address it need DMA addresses with the C-bit cleared, while devices that
can address encrypted memory should keep using encrypted DMA addresses.

The default swiotlb pool is marked shared when memory encryption is active.
Forcing all devices through that pool would also force devices capable of
encrypted DMA to use shared mappings. Clear the global swiotlb-force-bounce
state in this mode, and warn when this overrides an explicit swiotlb=force
command-line request.

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>

modified   arch/x86/kernel/pci-dma.c
@@ -51,8 +51,24 @@ static void __init pci_swiotlb_detect(void)
 	 * Set swiotlb to 1 so that bounce buffers are allocated and used for
 	 * devices that can't support DMA to encrypted memory.
 	 */
-	if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
+	if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) {
 		x86_swiotlb_enable = true;
+		/*
+		 * With host memory encryption and IOMMU passthrough, devices
+		 * that cannot address the C-bit need DMA addresses with the
+		 * C-bit cleared, while devices that can address encrypted
+		 * memory should keep using encrypted DMA addresses.
+		 *
+		 * The default SWIOTLB pool is marked shared when memory
+		 * encryption is active, so forcing all devices through it would
+		 * also force devices that support encrypted DMA to use shared
+		 * mappings. Disable global forced bouncing in this mode.
+		 */
+		if (iommu_default_passthrough() &&
+		    clear_swiotlb_force_bounce())
+			pr_warn("Ignoring swiotlb=force with host memory encryption and "
+				"IOMMU passthrough\n");
+	}
 
 	/*
 	 * Guest with guest memory encryption currently perform all DMA through
modified   include/linux/swiotlb.h
@@ -40,6 +40,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
 int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 	int (*remap)(void *tlb, unsigned long nslabs));
 extern void __init swiotlb_update_mem_attributes(void);
+bool __init clear_swiotlb_force_bounce(void);
 
 #ifdef CONFIG_SWIOTLB
 
modified   kernel/dma/swiotlb.c
@@ -208,6 +208,15 @@ unsigned long swiotlb_size_or_default(void)
 	return default_nslabs << IO_TLB_SHIFT;
 }
 
+bool __init clear_swiotlb_force_bounce(void)
+{
+	if (!swiotlb_force_bounce)
+		return false;
+
+	swiotlb_force_bounce = false;
+	return true;
+}
+
 void __init swiotlb_adjust_size(unsigned long size)
 {
 	/*
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help