Thread (41 messages) 41 messages, 8 authors, 2022-04-01

Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier

From: Thierry Reding <hidden>
Date: 2021-09-07 17:44:52
Also in: dri-devel, linux-devicetree, linux-iommu

On Tue, Sep 07, 2021 at 10:33:24AM -0500, Rob Herring wrote:
On Fri, Sep 3, 2021 at 10:36 AM Thierry Reding [off-list ref] wrote:
quoted
On Fri, Sep 03, 2021 at 09:36:33AM -0500, Rob Herring wrote:
quoted
On Fri, Sep 3, 2021 at 8:52 AM Thierry Reding [off-list ref] wrote:
quoted
On Fri, Sep 03, 2021 at 08:20:55AM -0500, Rob Herring wrote:
quoted
On Wed, Sep 1, 2021 at 9:13 AM Thierry Reding [off-list ref] wrote:
quoted
On Fri, Jul 02, 2021 at 05:16:25PM +0300, Dmitry Osipenko wrote:
quoted
01.07.2021 21:14, Thierry Reding пишет:
quoted
On Tue, Jun 08, 2021 at 06:51:40PM +0200, Thierry Reding wrote:
quoted
On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding wrote:
quoted
On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote:
quoted
On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote:
quoted
From: Thierry Reding <redacted>

Reserved memory region phandle references can be accompanied by a
specifier that provides additional information about how that specific
reference should be treated.

One use-case is to mark a memory region as needing an identity mapping
in the system's IOMMU for the device that references the region. This is
needed for example when the bootloader has set up hardware (such as a
display controller) to actively access a memory region (e.g. a boot
splash screen framebuffer) during boot. The operating system can use the
identity mapping flag from the specifier to make sure an IOMMU identity
mapping is set up for the framebuffer before IOMMU translations are
enabled for the display controller.

Signed-off-by: Thierry Reding <redacted>
---
 .../reserved-memory/reserved-memory.txt       | 21 +++++++++++++++++++
 include/dt-bindings/reserved-memory.h         |  8 +++++++
 2 files changed, 29 insertions(+)
 create mode 100644 include/dt-bindings/reserved-memory.h
Sorry for being slow on this. I have 2 concerns.

First, this creates an ABI issue. A DT with cells in 'memory-region'
will not be understood by an existing OS. I'm less concerned about this
if we address that with a stable fix. (Though I'm pretty sure we've
naively added #?-cells in the past ignoring this issue.)
A while ago I had proposed adding memory-region*s* as an alternative
name for memory-region to make the naming more consistent with other
types of properties (think clocks, resets, gpios, ...). If we added
that, we could easily differentiate between the "legacy" cases where
no #memory-region-cells was allowed and the new cases where it was.
quoted
Second, it could be the bootloader setting up the reserved region. If a
node already has 'memory-region', then adding more regions is more
complicated compared to adding new properties. And defining what each
memory-region entry is or how many in schemas is impossible.
It's true that updating the property gets a bit complicated, but it's
not exactly rocket science. We really just need to splice the array. I
have a working implemention for this in U-Boot.

For what it's worth, we could run into the same issue with any new
property that we add. Even if we renamed this to iommu-memory-region,
it's still possible that a bootloader may have to update this property
if it already exists (it could be hard-coded in DT, or it could have
been added by some earlier bootloader or firmware).
quoted
Both could be addressed with a new property. Perhaps something like
'iommu-memory-region = <&phandle>;'. I think the 'iommu' prefix is
appropriate given this is entirely because of the IOMMU being in the
mix. I might feel differently if we had other uses for cells, but I
don't really see it in this case.
I'm afraid that down the road we'll end up with other cases and then we
might proliferate a number of *-memory-region properties with varying
prefixes.

I am aware of one other case where we might need something like this: on
some Tegra SoCs we have audio processors that will access memory buffers
using a DMA engine. These processors are booted from early firmware
using firmware from system memory. In order to avoid trashing the
firmware, we need to reserve memory. We can do this using reserved
memory nodes. However, the audio DMA engine also uses the SMMU, so we
need to make sure that the firmware memory is marked as reserved within
the SMMU. This is similar to the identity mapping case, but not exactly
the same. Instead of creating a 1:1 mapping, we just want that IOVA
region to be reserved (i.e. IOMMU_RESV_RESERVED instead of
IOMMU_RESV_DIRECT{,_RELAXABLE}).

That would also fall into the IOMMU domain, but we can't reuse the
iommu-memory-region property for that because then we don't have enough
information to decide which type of reservation we need.

We could obviously make iommu-memory-region take a specifier, but we
could just as well use memory-regions in that case since we have
something more generic anyway.

With the #memory-region-cells proposal, we can easily extend the cell in
the specifier with an additional MEMORY_REGION_IOMMU_RESERVE flag to
take that other use case into account. If we than also change to the new
memory-regions property name, we avoid the ABI issue (and we gain a bit
of consistency while at it).
Ping? Rob, do you want me to add this second use-case to the patch
series to make it more obvious that this isn't just a one-off thing? Or
how do we proceed?
Rob, given that additional use-case, do you want me to run with this
proposal and send out an updated series?

What about variant with a "descriptor" properties that will describe
each region:

fb_desc: display-framebuffer-memory-descriptor {
      needs-identity-mapping;
}

display@52400000 {
      memory-region = <&fb ...>;
      memory-region-descriptor = <&fb_desc ...>;
};

It could be a more flexible/extendible variant.
This problem recently came up on #dri-devel again. Adding Alyssa and
Sven who are facing a similar challenge on their work on Apple M1 (if I
understood correctly). Also adding dri-devel for visibility since this
is a very common problem for display in particular.

On M1 the situation is slightly more complicated: the firmware will
allocate a couple of buffers (including the framebuffer) in high memory
(> 4 GiB) and use the IOMMU to map that into an IOVA region below 4 GiB
so that the display hardware can access it. This makes it impossible to
bypass the IOMMU like we do on other chips (in particular to work around
the fault-by-default policy of the ARM SMMU driver). It also means that
in addition to the simple reserved regions I mentioned we need for audio
use-cases and identity mapping use-cases we need for display on Tegra,
we now also need to be able to convey physical to IOVA mappings.

Fitting the latter into the original proposal sounds difficult. A quick
fix would've been to generate a mapping table in memory and pass that to
the kernel using a reserved-memory node (similar to what's done for
example on Tegra for the EMC frequency table on Tegra210) and mark it as
such using a special flag. But that then involves two layers of parsing,
which seems a bit suboptimal. Another way to shoehorn that into the
original proposal would've been to add flags for physical and virtual
address regions and use pairs to pass them using special flags. Again,
this is a bit wonky because it needs these to be carefully parsed and
matched up.

Another downside is that we now have a situation where some of these
regions are no longer "reserved-memory regions" in the traditional
sense. This would require an additional flag in the reserved-memory
region nodes to prevent the IOVA regions from being reserved. By the
way, this is something that would also be needed for the audio use-case
I mentioned before, because the physical memory at that address can
still be used by an operating system.

A more general solution would be to draw a bit from Dmitry's proposal
and introduce a new top-level "iov-reserved-memory" node. This could be
modelled on the existing reserved-memory node, except that the physical
memory pages for regions represented by child nodes would not be marked
as reserved. Only the IOVA range described by the region would be
reserved subsequently by the IOMMU framework and/or IOMMU driver.

The simplest case where we just want to reserve some IOVA region could
then be done like this:

        iov-reserved-memory {
                /*
                 * Probably safest to default to <2>, <2> here given
                 * that most IOMMUs support either > 32 bits of IAS
                 * or OAS.
                 */
                #address-cells = <2>;
                #size-cells = <2>;

                firmware: firmware@80000000 {
                        reg = <0 0x80000000 0 0x01000000>;
                };
        };

        audio@30000000 {
                ...
                iov-memory-regions = <&firmware>;
                ...
        };

Mappings could be represented by an IOV reserved region taking a
reference to the reserved-region that they map:

        reserved-memory {
                #address-cells = <2>;
                #size-cells = <2>;

                /* 16 MiB of framebuffer at top-of-memory */
                framebuffer: framebuffer@1,ff000000 {
                        reg = <0x1 0xff000000 0 0x01000000>;
                        no-map;
                };
        };

        iov-reserved-memory {
                /* IOMMU supports only 32-bit output address space */
                #address-cells = <1>;
                #size-cells = <1>;

                /* 16 MiB of framebuffer mapped to top of IOVA */
                fb: fb@ff000000 {
                        reg = <0 0xff000000 0 0x01000000>;
                        memory-region = <&framebuffer>;
                };
        };

        display@40000000 {
                ...
                /* optional? */
                memory-region = <&framebuffer>;
                iov-memory-regions = <&fb>;
                ...
        };

It's interesting how identity mapped regions now become a trivial
special case of mappings. All that is needed is to make the reg property
of the IOV reserved region correspond to the reg property of the normal
reserved region. Alternatively, as a small optimization for lazy people
like me, we could just allow these cases to omit the reg property and
instead inherit it from the referenced reserved region.

As the second example shows it might be convenient if memory-region
could be derived from iov-memory-regions. This could be useful for cases
where the driver wants to do something with the physical pages of the
reserved region (such as mapping them and copying out the framebuffer
data to another buffer so that the reserved memory can be recycled). If
we have the IOV reserved region, we could provide an API to extract the
physical reserved region (if it exists). That way we could avoid
referencing it twice in DT. Then again, there's something elegant about
the explicit second reference to. It indicates the intent that we may
want to use the region for something other than just the IOV mapping.

Anyway, this has been long enough. Let me know what you think. Alyssa,
Sven, it'd be interesting to hear if you think this could work as a
solution to the problem on M1.

Rob, I think you might like this alternative because it basically gets
rid of all the points in the original proposal that you were concerned
about. Let me know what you think.
Couldn't we keep this all in /reserved-memory? Just add an iova
version of reg. Perhaps abuse 'assigned-address' for this purpose. The
issue I see would be handling reserved iova areas without a physical
area. That can be handled with just a iova and no reg. We already have
a no reg case.
I had thought about that initially. One thing I'm worried about is that
every child node in /reserved-memory will effectively cause the memory
that it described to be reserved. But we don't want that for regions
that are "virtual only" (i.e. IOMMU reservations).
By virtual only, you mean no physical mapping, just a region of
virtual space, right? For that we'd have no 'reg' and therefore no
(physical) reservation by the OS. It's similar to non-static regions.
You need a specific handler for them. We'd probably want a compatible
as well for these virtual reservations.
Yeah, these would be purely used for reserving regions in the IOVA so
that they won't be used by the IOVA allocator. Typically these would be
used for cases where those addresses have some special meaning.

Do we want something like:

        compatible = "iommu-reserved";

for these? Or would that need to be:

        compatible = "linux,iommu-reserved";

? There seems to be a mix of vendor-prefix vs. non-vendor-prefix
compatible strings in the reserved-memory DT bindings directory.
I would not use 'linux,' here.
quoted
On the other hand, do we actually need the compatible string? Because we
don't really want to associate much extra information with this like we
do for example with "shared-dma-pool". The logic to handle this would
all be within the IOMMU framework. All we really need is for the
standard reservation code to skip nodes that don't have a reg property
so we don't reserve memory for "virtual-only" allocations.
It doesn't hurt to have one and I can imagine we might want to iterate
over all the nodes. It's slightly easier and more common to iterate
over compatible nodes rather than nodes with some property.
quoted
quoted
Are these being global in DT going to be a problem? Presumably we have
a virtual space per IOMMU. We'd know which IOMMU based on a device's
'iommus' and 'memory-region' properties, but within /reserved-memory
we wouldn't be able to distinguish overlapping addresses from separate
address spaces. Or we could have 2 different IOVAs for 1 physical
space. That could be solved with something like this:

iommu-addresses = <&iommu1 <address cells> <size cells>>;
The only case that would be problematic would be if we have overlapping
physical regions, because that will probably trip up the standard code.

But this could also be worked around by looking at iommu-addresses. For
example, if we had something like this:

        reserved-memory {
                fb_dc0: fb@80000000 {
                        reg = <0x80000000 0x01000000>;
                        iommu-addresses = <0xa0000000 0x01000000>;
                };

                fb_dc1: fb@80000000 {
You can't have 2 nodes with the same name (actually, you can, they
just get merged together). Different names with the same unit-address
is a dtc warning. I'd really like to make that a full blown
overlapping region check.
Right... so this would be a lot easier to deal with using that earlier
proposal where the IOMMU regions were a separate thing and referencing
the reserved-memory nodes. In those cases we could just have the
physical reservation for the framebuffer once (so we don't get any
duplicates or overlaps) and then have each IOVA reservation reference
that to create the mapping.
quoted
                        reg = <0x80000000 0x01000000>;
                        iommu-addresses = <0xb0000000 0x01000000>;
                };
        };

We could make the code identify that this is for the same physical
reservation (maybe make it so that reg needs to match exactly for this
to be recognized) but with different virtual allocations.

On a side-note: do we really need to repeat the size? I'd think if we
want mappings then we'd likely want them for the whole reservation.
Humm, I suppose not, but dropping it paints us into a corner if we
come up with wanting a different size later. You could have a carveout
for double/triple buffering your framebuffer, but the bootloader
framebuffer is only single buffered. So would you want actual size?
Perhaps this needs to be a bit more verbose then. If we want the ability
to create a mapping for only a partial reservation, I could imagine we
may as well want one that doesn't start at the beginning. So perhaps an
ever better solution would be to have a complete mapping, something that
works similar to "ranges" perhaps, like so:

	fb@80000000 {
		reg = <0x80000000 0x01000000>;
		iommu-ranges = <0x80000000 0x01000000 0x80000000>;
	};

That would be for a full identity mapping, but we could also have
something along the lines of this:

	fb@80000000 {
		reg = <0x80000000 0x01000000>;
		iommu-ranges = <0x80100000 0x00100000 0xa0000000>;
	};

So that would only map a 1 MiB chunk at offset 1 MiB (of the physical
reservation) to I/O virtual address 0xa0000000.
quoted
I'd like to keep references to IOMMUs out of this because they would be
duplicated. We will only use these nodes if they are referenced by a
device node that also has an iommus property. Also, the IOMMU reference
itself isn't enough. We'd also need to support the complete specifier
because you can have things like SIDs in there to specify the exact
address space that a device uses.

Also, for some of these they may be reused independently of the IOMMU
address space. For example the Tegra framebuffer identity mapping can
be used by either of the 2-4 display controllers, each with (at least
potentially) their own address space. But we don't want to have to
describe the identity mapping separately for each display controller.
Okay, but I'd rather have to duplicate things in your case than not be
able to express some other case.
The earlier "separate iov-reserved-memory" proposal would be a good
compromise here. It'd allow us to duplicate only the necessary bits
(i.e. the IOVA mappings) but keep the common bits simple. And even
the IOVA mappings could be shared for cases like identity mappings.
See below for more on that.
quoted
Another thing to consider is that these nodes will often be added by
firmware (e.g. firmware will allocate the framebuffer and set up the
corresponding reserved memory region in DT). Wiring up references like
this would get very complicated very quickly.
Yes.

The using 'iommus' property option below can be optional and doesn't
have to be defined/supported now. Just trying to think ahead and not
be stuck with something that can't be extended.
One other benefit of the separate iov-reserved-memory node would be that
the iommus property could be simplified. If we have a physical
reservation that needs to be accessed by multiple different display
controllers, we'd end up with something fairly complex, such as this:

	fb: fb@80000000 {
		reg = <0x80000000 0x01000000>;
		iommus = <&dc0_iommu 0xa0000000 0x01000000>,
			 <&dc1_iommu 0xb0000000 0x01000000>,
			 <&dc2_iommu 0xc0000000 0x01000000>;
	};

This would get even worse if we want to support partial mappings. Also,
it'd become quite complicated to correlate this with the memory-region
references:

	dc0: dc@40000000 {
		...
		memory-region = <&fb>;
		iommus = <&dc0_iommu>;
		...
	};

So now you have to go match up the phandle (and potentially specifier)
in the iommus property of the disp0 node with an entry in the fb node's
iommus property. That's all fairly complicated stuff.

With separate iov-reserved-memory, this would be a bit more verbose, but
each individual node would be simpler:

	reserved-memory {
		fb: fb@80000000 {
			reg = <0x80000000 0x01000000>;
		};
	};

	iov-reserved-memory {
		fb0: fb@80000000 {
			/* identity mapping, "reg" optional? */
			reg = <0x80000000 0x01000000>;
			memory-region = <&fb>;
		};

		fb1: fb@90000000 {
			/* but doesn't have to be */
			reg = <0x90000000 0x01000000>;
			memory-region = <&fb>;
		};

		fb2: fb@a0000000 {
			/* can be partial, too */
			ranges = <0x80000000 0x00800000 0xa0000000>;
			memory-region = <&fb>;
		};
	}

	dc0: dc@40000000 {
		iov-memory-regions = <&fb0>;
		/* optional? */
		memory-region = <&fb>;
		iommus = <&dc0_iommu>;
	};

Alternatively, if we want to support partial mappings, we could replace
those reg properties by ranges properties that I showed earlier. We may
even want to support both. Use "reg" for virtual-only reservations and
identity mappings, or "simple partial mappings" (that map a sub-region
starting from the beginning). Identity mappings could still be
simplified by just omitting the "reg" property. For more complicated
mappings, such as the ones on M1, the "ranges" property could be used.

Note how this looks a bit boilerplate-y, but it's actually really quite
simple to understand, even for humans, I think.

Also, the phandles in this are comparatively easy to wire up because
they can all be generated in a hierarchical way: generate physical
reservation and store phandle, then generate I/O virtual reservation
to reference that phandle and store the new phandle as well. Finally,
wire this up to the display controller (using either the IOV phandle or
both).

Granted, this requires the addition of a new top-level node, but given
how expressive this becomes, I think it might be worth a second
consideration.

Thierry

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help