Re: [PATCH 06/26] drm/bridge: add devm_drm_of_find_bridge

From: Maxime Ripard <mripard@kernel.org>
Date: 2025-12-15 10:35:59
Also in: imx, linux-amlogic, linux-doc, linux-mediatek, linux-renesas-soc, linux-samsung-soc, lkml

On Fri, Dec 12, 2025 at 12:10:37PM +0100, Luca Ceresoli wrote:

Hi Maxime,

On Thu Dec 11, 2025 at 6:47 PM CET, Luca Ceresoli wrote:

quoted

Hi Maxime,

On Mon Dec 1, 2025 at 5:51 PM CET, Maxime Ripard wrote:

quoted

On Mon, Nov 24, 2025 at 05:25:39PM +0100, Luca Ceresoli wrote:

quoted

Hi Maxime,

On Mon Nov 24, 2025 at 11:39 AM CET, Maxime Ripard wrote:

quoted

On Wed, Nov 19, 2025 at 02:05:37PM +0100, Luca Ceresoli wrote:

quoted

Several drivers (about 20) follow the same pattern:

 1. get a pointer to a bridge (typically the next bridge in the chain) by
    calling of_drm_find_bridge()
 2. store the returned pointer in the private driver data, keep it until
    driver .remove
 3. dereference the pointer at attach time and possibly at other times

of_drm_find_bridge() is now deprecated because it does not increment the
refcount and should be replaced with drm_of_find_bridge() +
drm_bridge_put().

However some of those drivers have a complex code flow and adding a
drm_bridge_put() call in all the appropriate locations is error-prone,
leads to ugly and more complex code, and can lead to errors over time with
code flow changes.

To handle all those drivers in a straightforward way, add a devm variant of
drm_of_find_bridge() that adds a devm action to invoke drm_bridge_put()
when the said driver is removed. This allows all those drivers to put the
reference automatically and safely with a one line change:

  - priv->next_bridge = of_drm_find_bridge(remote_np);
  + priv->next_bridge = devm_drm_of_find_bridge(dev, remote_np);

Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

---
 drivers/gpu/drm/drm_bridge.c | 30 ++++++++++++++++++++++++++++++
 include/drm/drm_bridge.h     |  5 +++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
index 09ad825f9cb8..c7baafbe5695 100644
--- a/drivers/gpu/drm/drm_bridge.c
+++ b/drivers/gpu/drm/drm_bridge.c

@@ -1446,6 +1446,36 @@ struct drm_bridge *drm_of_find_bridge(struct device_node *np)
 }
 EXPORT_SYMBOL(drm_of_find_bridge);

+/**
+ * devm_drm_of_find_bridge - find the bridge corresponding to the device
+ *			     node in the global bridge list and add a devm
+ *			     action to put it
+ *
+ * @dev: device requesting the bridge
+ * @np: device node
+ *
+ * On success the returned bridge refcount is incremented, and a devm
+ * action is added to call drm_bridge_put() when @dev is removed. So the
+ * caller does not have to put the returned bridge explicitly.
+ *
+ * RETURNS:
+ * drm_bridge control struct on success, NULL on failure
+ */
+struct drm_bridge *devm_drm_of_find_bridge(struct device *dev, struct device_node *np)
+{
+	struct drm_bridge *bridge = drm_of_find_bridge(np);
+
+	if (bridge) {
+		int err = devm_add_action_or_reset(dev, drm_bridge_put_void, bridge);
+
+		if (err)
+			return ERR_PTR(err);
+	}
+
+	return bridge;
+}
+EXPORT_SYMBOL(devm_drm_of_find_bridge);

That's inherently unsafe though, because even if the bridge is removed
other parts of DRM might still have a reference to it and could call
into it.

We'd then have dropped our reference to the next bridge, which could
have been freed, and it's a use-after-free.

I think you refer to this scenario:

  1. pipeline: encoder --> bridge A --> bridge B --> bridge C
  2. encoder takes a reference to bridge B
     using devm_drm_of_find_bridge() or other means
  3. bridge B takes a next_bridge reference to bridge C
     using devm_drm_of_find_bridge()
  4. encoder calls (bridge B)->foo(), which in turns references
     next_bridge, e.g.:

       b_foo() {
           bar(b->next_bridge);
       }

If bridges B and C are removed, bridge C can be freed but B is still
allocated because the encoder holds a ref. So when step 4 happens, 'b->c'
would be a use-after-free (or NULL deref if b.remove cleared it, which is
just as bad).

Yep.

quoted

If I got you correctly, then I'm a bit surprised by your comment. This
series is part of the first chapter of the hotplug work, which does not aim
at fixing everything but rather at fixing one part: handle dynamic
_allocation_ lifetime of drm_bridges by adding a refcount and
drm_bridge_get/put().

Chapter 2 of the work is adding drm_bridge_enter/exit/unplug() [1] and
other changes in order to avoid code of drivers of removed bridges to
access fields they shouldn't. So the above example at point 4 would become:

       b_foo() {
           if (!drm_bridge_enter())
               return;
           bar(b->c);
           drm_bridge_exit();
       }

And that avoids 'b->c' after bridge B is removed.

Does that answer your remark?

Not really. I wasn't really questionning your current focus, or the way
you laid out the current agenda or whatever.

What I am questionning though is whether or not we want to introduce
something we will have to untangle soon, and even more so when we're not
mentioning it anywhere.

quoted

It's more complicated than it sounds, because we only have access to the
drm_device when the bridge is attached, so later than probe.

I wonder if we shouldn't tie the lifetime of that reference to the
lifetime of the bridge itself, and we would give up the next_bridge
reference only when we're destroyed ourselves.

I'm afraid I'm not following you, sorry. Do you refer to the time between
the bridge removal (driver .remove) and the last bridge put (when
deallocation happens)?

In that time frame the struct drm_bridge is still allocated along with any
next_bridge pointer it may contain, but the following bridge could have
been deallocated.

What do you mean by "give up the next_bridge"?

What I was trying to say was that if we want to fix the problem you
illustrated about, we need to give up the reference at __drm_bridge_free
time. So each bridge having a reference to a bridge would need to do so
in its destroy hook.

Since it's quite a common pattern, it would make sense to add a
next_bridge field to drm_bridge itself, so the core can do it
automatically in __drm_bridge_free if that pointer is !NULL.

But...

quoted

Storing a list of all the references we need to drop is going to be
intrusive though, so maybe the easiest way to do it would be to create a
next_bridge field in drm_bridge, and only drop the reference stored
there?

And possibly tie the whole thing together using a helper?

Anyway, I'm not sure it should be a prerequisite to this series. I we do
want to go the devm_drm_of_find_bridge route however, we should at least
document that it's unsafe, and add a TODO entry to clean up the mess
later on.

... I *really* don't consider it something you need to work on right now.

quoted

Do you mean the drm variant is unsafe while the original
(drm_of_find_bridge() in this series, might be renamed) is not? I
don't see how that can happen. If the driver for bridge B were to use
drm_of_find_bridge(), that driver would be responsible to
drm_bridge_put(b->next_bridge) in its .remove() function or earlier.
So the next_bridge pointing to bridge C would equally become subject
to use-after-free.

No, I was saying that both are equally unsafe. But we're adding a new,
broken, helper, and we don't mention anywhere that it is. So what I was
saying is mostly do we really want to introduce some more broken code
when we know it is. And if we do, we should be really clear about it.

quoted

devm does not make it worse, on the opposite it postpones the
drm_bridge_put(next_bridge) as late as possible: just after
b.remove().

Which doesn't really change anything, does it? I'd expect the window
between the remove and final drm_bridge_put to be much wider than the
execution time of remove itself.

quoted

One final, high-level thought about the various 'next_bridge' pointers that
many bridge drivers have. Most of them do:

 0. have a 'struct drm_bridge next_bridge *' in their private struct
 1. take the next_bridge reference during probe or another startup phase
 2. store it in their private driver struct
 3. use it to call drm_bridge_attach
 4. (pending) put the reference to it in their .remove or earlier

I'm wondering whether we could let the DRM bridge core do it all, by
removing items 0, 1, 2 and 4, and change 3 as:

-     drm_bridge_attach(encoder, me->next_bridge, &me->bridge, flags);
+  drm_of_bridge_attach(encoder, &me->bridge, dev->of_node, 1, -1, flags);

where dev->of_node and the following integers are the same flags passed to
devm_drm_of_get_bridge() and the like, i.e. the endpoint info needed to
walk the DT graph and reach the next bridge.

This would allow the core to take care of all locking and lifetime of the
next bridge, and most (all?) bridges would never access any pointers to the
next bridge. The idea is to let the core do the right thing in a single
place instead of trying to make all drivers do the right thing (and
touching dozen files when needing to touch the logic).

That is more a long-term ideal than something I'd do right now, but having
opinions would be very interesting.

That was pretty much my point, yeah.

Maxime

Let me recap this discussion, because there are various aspects and I need
to clarify by view on it.

First: the problem you discuss is about drm_of_find_bridge() introduced in
patch 1. The devm variant is just equally affected.

You proposed adding a next_bridge field in struct drm_bridge so there is an
automated, common call to drm_bridge_put() (and setting it to NULL). It
would remove some burden on individual drivers of course, but I don't think
it would solve the problem. In the same scenario we are discussing
(i.e. encoder --> bridge A --> bridge B --> bridge C, then B+C get removed)
B's next_bridge would be automatically put, but the encoder could still
call B->foo(), which could still do B->next_bridge.

Ah, I realied I'm wrong here. Your proposal is to put the reference at
__drm_bridge_free time, not a release time. So yes, it would work. At least
for the simple cases, where there's only the next_bridge pointer stored.

Yes, that's exactly what I was trying to say :)

quoted

Additionally, as a matter of fact there are currently drivers storing
bridge pointers. The next_bridge is the most common case. Code using
drm_bridge_connector_init() for example can store up to eight of them, but
individual drivers are the hardest to hunt for.

I can see these (potential) tools to handle this (not mutually exclusive):

 1. remove drm_bridge pointers pointing to other bridges
 2. check whether a bridge (say B) still exists before any dereference
    to B->another_bridge: that's drm_bridge_enter/exit()
 3. let owners of bridge pointers be notified when a bridge is unplugged,
    so they can actively put their reference and clear their pointer

For item 1, I think the drm_of_bridge_attach() idea quoted above would
work, at least for the simple cases where bridge drivers use the
next_bridge only for attach. A next_bridge pointer in struct drm_bridge is
not even needed in that case, the pointer would be computed from OF when
needed and not stored. I can do an experiment and send a first series, do
you think it would be useful?

I had a look and, while the implementation should be simple, only a few
drivers could benefit right now. The majority fall into one of these
categories:

 * drivers using drm_of_find_panel_or_bridge() or *_of_get_bridge()
   (maybe 60-80% of all drivers, those will have to wait for the panel
   improvements)
 * drivers using the next_bridge pointer for more than just attach
 * drivers doing more complicated stuff

I think your "put next_bridge in __drm_bridge_free" idea would fit well the
2nd category and perhaps also the 1st one. For the 3rd category we'd need
something different, e.g. a per-driver .destroy callback.

Yep, that's fine. We should optimize for the common case, with an escape
hatch. That's exactly what we are talking about here.

So, while your idea would work, it would avoid use-after-free but not
prevent calls into a bridge code after the bridge is removed, which is, in
the best case, useless. I still think we should aim at avoiding the
dereferences to even happen, so my 3 ideas above still look to me important
to evaluate.

It's useless indeed, but something that can and will happen, so we have
to take care into account still. We should avoid any unsafe behaviour,
but it's the best we can do.	

Maxime

Attachments

signature.asc [application/pgp-signature] 273 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help