Re: [PATCH 1/9] of: Add warpper function of_find_node_by_name_balanced()
From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Date: 2025-02-07 08:57:39
Also in:
arm-scmi, linux-devicetree, linux-media, netdev
Hi Oleksij, On Fri, Feb 07, 2025 at 09:24:10AM +0100, Oleksij Rempel wrote:
On Fri, Feb 07, 2025 at 09:31:09AM +0800, Zhang Zekun wrote:quoted
There are many drivers use of_find_node_by_name() with a not-NULL device_node pointer, and a number of callers would require a call to of_node_get() before using it. There are also some drivers who forget to call of_node_get() which would cause a ref count leak[1]. So, Add a wraper function for of_find_node_by_name(), drivers may use this function to call of_find_node_by_name() with the refcount already balanced. [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/ (local)Hi Zhang Zekun, thank you for working on this issue! First of all, let's take a step back and analyze the initial problem. Everything following is only my opinion... The main issue I see is that the current API - of_find_node_by_name - modifies the refcount of its input by calling of_node_put(from) as part of its search. Typically, a "find" function is expected to treat its input as read-only. That is, when you pass an object into such a function, you expect its reference count to remain unchanged unless ownership is explicitly transferred. In this case, lowering the refcount on the input node is counterintuitive and already lead to unexpected behavior and subtle bugs. To address this, the workaround introduces a wrapper function, of_find_node_by_name_balanced, which first increments the input’s refcount (via of_node_get()) before calling the original function. While this "balances" the refcount change, the naming remains problematic from my perspective. The "_balanced" suffix isn’t part of our common naming conventions (traditions? :)). Most drivers expect that a function starting with "find" will not alter the reference count of its input. The term "balanced" doesn’t clearly convey that the input's refcount is being explicitly managed - it instead obscures the underlying behavior, leaving many developers confused about what guarantees the API provides. In my view, a more natural solution would be to redesign the API so that it doesn’t modify the input object’s refcount at all. Instead, it should solely increase the refcount of the returned node (if found) for safe asynchronous usage. This approach would align with established conventions where "find" implies no side effects on inputs or output, and a "get" indicates that the output comes with an extra reference. For example, a function named of_get_node_by_name would clearly signal that only the returned node is subject to a refcount increase while leaving the input intact. Thus, while the current workaround "balances" the reference count, it doesn't address the underlying design flaw. The naming still suggests a "find" function that should leave the input untouched, which isn’t the case here. A redesign of the API - with both the behavior and naming aligned to common expectations - would be a clearer and more robust solution. Nevertheless, it is only my POV, and the final decision rests with the OpenFirmware framework maintainers.
I agree overall that the naming is not optimal. Looking at the other patches in the series, I think at least some of them misuse of_find_node_by_name(). For instance, drivers/media/i2c/max9286.c calls the function to find a *child* node of the device's of_node named "i2c-mux", while of_find_node_by_name() looks at children first but will then walk the *whole* DT to find a named node. I haven't checked all patches, but other ones seem to suffer from the same misuse. Assuming that the named node those drivers are looking for is a direct child of the node passed as argument to of_find_node_by_name(), the right fix would tbe to use of_get_child_by_name(). If it's not a direct child, a recursive version of of_get_child_by_name() could be useful. -- Regards, Laurent Pinchart