Re: [RFC PATCH v3 0/6] Restricted DMA
From: Florian Fainelli <f.fainelli@gmail.com>
Date: 2021-01-13 04:42:40
Also in:
linux-devicetree, linuxppc-dev, lkml, xen-devel
On 1/12/2021 8:25 PM, Tomasz Figa wrote:
On Wed, Jan 13, 2021 at 12:56 PM Florian Fainelli [off-list ref] wrote:quoted
On 1/12/2021 6:29 PM, Tomasz Figa wrote:quoted
Hi Florian, On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli [off-list ref] wrote:quoted
On 1/11/21 11:48 PM, Claire Chang wrote:quoted
On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli [off-list ref] wrote:quoted
On 1/7/21 9:42 AM, Claire Chang wrote:quoted
quoted
Can you explain how ATF gets involved and to what extent it does help, besides enforcing a secure region from the ARM CPU's perpsective? Does the PCIe root complex not have an IOMMU but can somehow be denied access to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is still some sort of basic protection that the HW enforces, right?We need the ATF support for memory MPU (memory protection unit). Restricted DMA (with reserved-memory in dts) makes sure the predefined memory region is for PCIe DMA only, but we still need MPU to locks down PCIe access to that specific regions.OK so you do have a protection unit of some sort to enforce which region in DRAM the PCIE bridge is allowed to access, that makes sense, otherwise the restricted DMA region would only be a hint but nothing you can really enforce. This is almost entirely analogous to our systems then.Here is the example of setting the MPU: https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132quoted
There may be some value in standardizing on an ARM SMCCC call then since you already support two different SoC vendors.quoted
quoted
On Broadcom STB SoCs we have had something similar for a while however and while we don't have an IOMMU for the PCIe bridge, we do have a a basic protection mechanism whereby we can configure a region in DRAM to be PCIe read/write and CPU read/write which then gets used as the PCIe inbound region for the PCIe EP. By default the PCIe bridge is not allowed access to DRAM so we must call into a security agent to allow the PCIe bridge to access the designated DRAM region. We have done this using a private CMA area region assigned via Device Tree, assigned with a and requiring the PCIe EP driver to use dma_alloc_from_contiguous() in order to allocate from this device private CMA area. The only drawback with that approach is that it requires knowing how much memory you need up front for buffers and DMA descriptors that the PCIe EP will need to process. The problem is that it requires driver modifications and that does not scale over the number of PCIe EP drivers, some we absolutely do not control, but there is no need to bounce buffer. Your approach scales better across PCIe EP drivers however it does require bounce buffering which could be a performance hit.Only the streaming DMA (map/unmap) needs bounce buffering.True, and typically only on transmit since you don't really control where the sk_buff are allocated from, right? On RX since you need to hand buffer addresses to the WLAN chip prior to DMA, you can allocate them from a pool that already falls within the restricted DMA region, right?Right, but applying bounce buffering to RX will make it more secure. The device won't be able to modify the content after unmap. Just like what iommu_unmap does.Sure, however the goals of using bounce buffering equally applies to RX and TX in that this is the only layer sitting between a stack (block, networking, USB, etc.) and the underlying device driver that scales well in order to massage a dma_addr_t to be within a particular physical range. There is however room for improvement if the drivers are willing to change their buffer allocation strategy. When you receive Wi-Fi frames you need to allocate buffers for the Wi-Fi device to DMA into, and that happens ahead of the DMA transfers by the Wi-Fi device. At buffer allocation time you could very well allocate these frames from the restricted DMA region without having to bounce buffer them since the host CPU is in control over where and when to DMA into.That is, however, still a trade-off between saving that one copy and protection from the DMA tampering with the packet contents when the kernel is reading them. Notice how the copy effectively makes a snapshot of the contents, guaranteeing that the kernel has a consistent view of the packet, which is not true if the DMA could modify the buffer contents in the middle of CPU accesses.I would say that the window just became so much narrower for the PCIe end-point to overwrite contents with the copy because it would have to happen within the dma_unmap_{page,single} time and before the copy is finished to the bounce buffer.Not only. Imagine this: a) Without bouncing: - RX interrupt - Pass the packet to the network stack - Network stack validates the packet - DMA overwrites the packet - Network stack goes boom, because the packet changed after validation b) With bouncing: - RX interrupt - Copy the packet to a DMA-inaccessible buffer - Network stack validates the packet - Network stack is happy, because the packet is guaranteed to stay the same after validation
Yes that's a much safer set of operations, thanks for walking through a practical example. -- Florian _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu