Re: [RFC PATCH 0/5] drm/panthor: Protected mode support for Mali CSF GPUs

From: Boris Brezillon <boris.brezillon@collabora.com>
Date: 2025-02-11 14:33:00
Also in: dmaengine, dri-devel, linux-devicetree, linux-media, linux-mediatek, lkml

On Tue, 11 Feb 2025 14:46:56 +0100
Maxime Ripard [off-list ref] wrote:

Hi Boris,

On Fri, Feb 07, 2025 at 04:02:53PM +0100, Boris Brezillon wrote:

quoted

Sorry for joining the party late, a couple of comments to back Akash
and Nicolas' concerns.

On Wed, 05 Feb 2025 13:14:14 -0500
Nicolas Dufresne [off-list ref] wrote:

quoted

Le mercredi 05 février 2025 à 15:52 +0100, Maxime Ripard a écrit :

quoted

On Mon, Feb 03, 2025 at 04:43:23PM +0000, Florent Tomasin wrote:

quoted

Hi Maxime, Nicolas

On 30/01/2025 17:47, Nicolas Dufresne wrote:

quoted

Le jeudi 30 janvier 2025 à 17:38 +0100, Maxime Ripard a écrit :

quoted

Hi Nicolas,

On Thu, Jan 30, 2025 at 10:59:56AM -0500, Nicolas Dufresne wrote:

quoted

Le jeudi 30 janvier 2025 à 14:46 +0100, Maxime Ripard a écrit :

quoted

Hi,

I started to review it, but it's probably best to discuss it here.

On Thu, Jan 30, 2025 at 01:08:56PM +0000, Florent Tomasin wrote:

quoted

Hi,

This is a patch series covering the support for protected mode execution in
Mali Panthor CSF kernel driver.

The Mali CSF GPUs come with the support for protected mode execution at the
HW level. This feature requires two main changes in the kernel driver:

1) Configure the GPU with a protected buffer. The system must provide a DMA
   heap from which the driver can allocate a protected buffer.
   It can be a carved-out memory or dynamically allocated protected memory region.
   Some system includes a trusted FW which is in charge of the protected memory.
   Since this problem is integration specific, the Mali Panthor CSF kernel
   driver must import the protected memory from a device specific exporter.

Why do you need a heap for it in the first place? My understanding of
your series is that you have a carved out memory region somewhere, and
you want to allocate from that carved out memory region your buffers.

How is that any different from using a reserved-memory region, adding
the reserved-memory property to the GPU device and doing all your
allocation through the usual dma_alloc_* API?

How do you then multiplex this region so it can be shared between
GPU/Camera/Display/Codec drivers and also userspace ?

You could point all the devices to the same reserved memory region, and
they would all allocate from there, including for their userspace-facing
allocations.

I get that using memory region is somewhat more of an HW description, and
aligned with what a DT is supposed to describe. One of the challenge is that
Mediatek heap proposal endup calling into their TEE, meaning knowing the region
is not that useful. You actually need the TEE APP guid and its IPC protocol. If
we can dell drivers to use a head instead, we can abstract that SoC specific
complexity. I believe each allocated addressed has to be mapped to a zone, and
that can only be done in the secure application. I can imagine similar needs
when the protection is done using some sort of a VM / hypervisor.

Nicolas

The idea in this design is to abstract the heap management from the
Panthor kernel driver (which consumes a DMA buffer from it).

In a system, an integrator would have implemented a secure heap driver,
and could be based on TEE or a carved-out memory with restricted access,
or else. This heap driver would be responsible of implementing the
logic to: allocate, free, refcount, etc.

The heap would be retrieved by the Panthor kernel driver in order to
allocate protected memory to load the FW and allow the GPU to enter/exit
protected mode. This memory would not belong to a user space process.
The driver allocates it at the time of loading the FW and initialization
of the GPU HW. This is a device globally owned protected memory.

The thing is, it's really not clear why you absolutely need to have the
Panthor driver involved there. It won't be transparent to userspace,
since you'd need an extra flag at allocation time, and the buffers
behave differently. If userspace has to be aware of it, what's the
advantage to your approach compared to just exposing a heap for those
secure buffers, and letting userspace allocate its buffers from there?

Unless I'm mistaken, the Panthor driver loads its own firmware. Since loading
the firmware requires placing the data in a protected memory region, and that
this aspect has no exposure to userspace, how can Panthor not be implicated ?

Right, the very reason we need protected memory early is because some
FW sections need to be allocated from the protected pool, otherwise the
TEE will fault as soon at the FW enters the so-called 'protected mode'.

How does that work if you don't have some way to allocate the protected
memory? You can still submit jobs to the GPU, but you can't submit /
execute "protected jobs"?

Exactly.

quoted

Now, it's not impossible to work around this limitation. For instance,
we could load the FW without this protected section by default (what we
do right now), and then provide a DRM_PANTHOR_ENABLE_FW_PROT_MODE
ioctl that would take a GEM object imported from a dmabuf allocated
from the protected dma-heap by userspace. We can then reset the FW and
allow it to operate in protected mode after that point.

Urgh, I'd rather avoid that dance if possible :)

Me too.

quoted

This approach has two downsides though:

1. We have no way of checking that the memory we're passed is actually
suitable for FW execution in a protected context. If we're passed
random memory, this will likely hang the platform as soon as we enter
protected mode.

It's a current limitation of dma-buf in general, and you'd have the same
issue right now if someone imports a buffer, or misconfigure the heap
for a !protected heap.

I'd really like to have some way to store some metadata in dma_buf, if
only to tell that the buffer is protected.

The dma_buf has a pointer to its ops, so it should be relatively easy
to add an is_dma_buf_coming_from_this_heap() helper. Of course this
implies linking the consumer driver to the heap it's supposed to take
protected buffers from, which is basically the thing being discussed
here :-).

I suspect you'd also need that if you do things like do protected video
playback through a codec, get a protected frame, and want to import that
into the GPU. Depending on how you allocate it, either the codec or the
GPU or both will want to make sure it's protected.

If it's all allocated from a central "protected" heap (even if that
goes through the driver calling the dma_heap_alloc_buffer()), it
shouldn't be an issue.

quoted

2. If the driver already boot the FW and exposed a DRI node, we might
have GPU workloads running, and doing a FW reset might incur a slight
delay in GPU jobs execution.

I think #1 is a more general issue that applies to suspend buffers
allocated for GPU contexts too. If we expose ioctls where we take
protected memory buffers that can possibly lead to crashes if they are
not real protected memory regions, and we have no way to ensure the
memory is protected, we probably want to restrict these ioctls/modes to
some high-privilege CAP_SYS_.

For #2, that's probably something we can live with, since it's a
one-shot thing. If it becomes an issue, we can even make sure we enable
the FW protected-mode before the GPU starts being used for real.

This being said, I think the problem applies outside Panthor, and it
might be that the video codec can't reset the FW/HW block to switch to
protected mode as easily as Panthor.

Note that there's also downsides to the reserved-memory node approach,
where some bootloader stage would ask the secure FW to reserve a
portion of mem and pass this through the DT. This sort of things tend to
be an integration mess, where you need all the pieces of the stack (TEE,
u-boot, MTK dma-heap driver, gbm, ...) to be at a certain version to
work properly. If we go the ioctl() way, we restrict the scope to the
TEE, gbm/mesa and the protected-dma-heap driver, which is still a lot,
but we've ripped the bootloader out of the equation at least.

Yeah. I also think there's two discussions in parallel here:

 1) Being able to allocate protected buffers from the driver
 2) Exposing an interface to allocate those to userspace

I'm not really convinced we need 2, but 1 is obviously needed from what
you're saying.

I suspect we need #2 for GBM, still. But that's what dma-heaps are for,
so I don't think that's a problem.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help