Re: [PATCH 1/9] of: Add warpper function of_find_node_by_name_balanced()
From: Rob Herring <robh@kernel.org>
Date: 2025-02-11 14:15:52
Also in:
arm-scmi, linux-devicetree, linux-media, netdev
On Fri, Feb 7, 2025 at 9:37 AM Laurent Pinchart [off-list ref] wrote:
Hi Zekun, On Fri, Feb 07, 2025 at 07:28:23PM +0800, zhangzekun (A) wrote:quoted
在 2025/2/7 16:24, Oleksij Rempel 写道:quoted
On Fri, Feb 07, 2025 at 09:31:09AM +0800, Zhang Zekun wrote:quoted
There are many drivers use of_find_node_by_name() with a not-NULL device_node pointer, and a number of callers would require a call to of_node_get() before using it. There are also some drivers who forget to call of_node_get() which would cause a ref count leak[1]. So, Add a wraper function for of_find_node_by_name(), drivers may use this function to call of_find_node_by_name() with the refcount already balanced. [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/ (local)Hi Zhang Zekun, thank you for working on this issue! First of all, let's take a step back and analyze the initial problem. Everything following is only my opinion... The main issue I see is that the current API - of_find_node_by_name - modifies the refcount of its input by calling of_node_put(from) as part of its search. Typically, a "find" function is expected to treat its input as read-only. That is, when you pass an object into such a function, you expect its reference count to remain unchanged unless ownership is explicitly transferred. In this case, lowering the refcount on the input node is counterintuitive and already lead to unexpected behavior and subtle bugs. To address this, the workaround introduces a wrapper function, of_find_node_by_name_balanced, which first increments the input’s refcount (via of_node_get()) before calling the original function. While this "balances" the refcount change, the naming remains problematic from my perspective. The "_balanced" suffix isn’t part of our common naming conventions (traditions? :)). Most drivers expect that a function starting with "find" will not alter the reference count of its input. The term "balanced" doesn’t clearly convey that the input's refcount is being explicitly managed - it instead obscures the underlying behavior, leaving many developers confused about what guarantees the API provides. In my view, a more natural solution would be to redesign the API so that it doesn’t modify the input object’s refcount at all. Instead, it should solely increase the refcount of the returned node (if found) for safe asynchronous usage. This approach would align with established conventions where "find" implies no side effects on inputs or output, and a "get" indicates that the output comes with an extra reference. For example, a function named of_get_node_by_name would clearly signal that only the returned node is subject to a refcount increase while leaving the input intact. Thus, while the current workaround "balances" the reference count, it doesn't address the underlying design flaw. The naming still suggests a "find" function that should leave the input untouched, which isn’t the case here. A redesign of the API - with both the behavior and naming aligned to common expectations - would be a clearer and more robust solution. Nevertheless, it is only my POV, and the final decision rests with the OpenFirmware framework maintainers. Best Regards, OleksijHi, Oleksij, Thanks for your review. I think redesign the API would be a fundamental way to fix this issue, but it would cause impact for current users who knows the exact functionality of of_find_node_by_name(), which need to put the "from" before calling it (I can't make the assumption that all of drivers calling of_find_node_by_name() with a not-NULL "from" pointer, but not calling of_node_get() before is misuse). The basic idea for adding a new function is try not to impact current users. For users who are confused about the "_balanced" suffix,the comments of of_find_node_by_name() explains why this interface exists.I think we all agree that of_find_node_by_name() is miused, and that it shows the API isn't optimal. What we have different opinions on is how to make the API less error-prone. I think adding a new of_find_node_by_name_balanced() function works around the issue and doesn't improve the situation much, I would argue it makes things even more confusing. We have only 20 calls to of_find_node_by_name() with a non-NULL first argument in v6.14-rc1: arch/powerpc/platforms/chrp/pci.c: rtas = of_find_node_by_name (root, "rtas"); The 'root' variable here is the result of a call to 'of_find_node_by_path("/")', so I think we could pass a null pointer instead to simplify things.
I think this could just be 'of_find_node_by_path("/rtas")' which does
occur elsewhere.
arch/powerpc/platforms/powermac/pic.c: slave = of_find_node_by_name(master, "mac-io"); Here I believe of_find_node_by_name() is called to find a *child* node of 'master'. of_find_node_by_name() is the wrong function for that.
Yes, I think that's always a child of the PCI bus.
arch/sparc/kernel/leon_kernel.c: np = of_find_node_by_name(rootnp, "GAISLER_IRQMP"); arch/sparc/kernel/leon_kernel.c: np = of_find_node_by_name(rootnp, "01_00d"); arch/sparc/kernel/leon_kernel.c: np = of_find_node_by_name(nnp, "GAISLER_GPTIMER"); arch/sparc/kernel/leon_kernel.c: np = of_find_node_by_name(nnp, "01_011"); Here too the code seems to be looking for child nodes only (but I couldn't find a DT example or binding in-tree, so I'm not entirely sure).
Sparc doesn't use OF_DYNAMIC, so who cares... But there are some Sparc DT dumps here: https://git.kernel.org/pub/scm/linux/kernel/git/davem/prtconfs.git/
drivers/clk/ti/clk.c: return of_find_node_by_name(from, tmp); Usage here seems correct, the reference-count decrement is intended. drivers/media/i2c/max9286.c: i2c_mux = of_find_node_by_name(dev->of_node, "i2c-mux"); drivers/media/platform/qcom/venus/core.c: enp = of_find_node_by_name(dev->of_node, node_name); drivers/net/dsa/bcm_sf2.c: ports = of_find_node_by_name(dn, "ports"); drivers/net/dsa/hirschmann/hellcreek_ptp.c: leds = of_find_node_by_name(hellcreek->dev->of_node, "leds"); drivers/net/ethernet/broadcom/asp2/bcmasp.c: ports_node = of_find_node_by_name(dev->of_node, "ethernet-ports"); drivers/net/ethernet/marvell/prestera/prestera_main.c: ports = of_find_node_by_name(sw->np, "ports"); drivers/net/pse-pd/tps23881.c: channels_node = of_find_node_by_name(priv->np, "channels"); drivers/regulator/scmi-regulator.c: np = of_find_node_by_name(handle->dev->of_node, "regulators"); drivers/regulator/tps6594-regulator.c: np = of_find_node_by_name(tps->dev->of_node, multi_regs[multi].supply_name); Incorrect usage, as far as I understand all those drivers are looking for child nodes only. drivers/of/unittest.c: found = of_find_node_by_name(nd->overlay, "test-unittest16"); drivers/of/unittest.c: found = of_find_node_by_name(nd->overlay, "test-unittest17"); drivers/of/unittest.c: found = of_find_node_by_name(nd->overlay, "test-unittest18"); drivers/of/unittest.c: found = of_find_node_by_name(nd->overlay, "test-unittest19"); Here too I think only child nodes are meant to be considered. of_find_node_by_name() is very much misused as most callers want to find child nodes, while of_find_node_by_name() will walk the whole DT from a given starting point.
Agreed. In general, it's preferred to look for nodes by compatible rather than node name with child nodes being an exception.
I think the right fix here is to - Replace of_find_node_by_name(root, ...) with of_find_node_by_name(NULL, ...) in arch/powerpc/platforms/chrp/pci.c (if my understanding of the code is correct). - Replace of_find_node_by_name() with of_get_child_by_name() in callers that need to search immediate children only (I expected that to be the majority of the above call sites).
+1
- If there are other callers that need to find indirect children, introduce a new of_get_child_by_name_recursive() function. At that point, the only remaining caller of of_find_node_by_name() (beside its usage in the for_each_node_by_name() macro) will be drivers/clk/ti/clk.c, which uses the function correctly. I'm tempted to then rename of_find_node_by_name() to __of_find_node_by_name() to indicate it's an internal function not meant to be called except in special cases. It could all be renamed to __of_find_next_node_by_name() to make its behaviour clearer.
+1 There's a number of functions which I classify as "don't add new users" which I keep meaning to document. It's generally older stuff still using them. Anything returning pointers to raw DT data are problematic for dynamic DT. Hence all the patches removing of_find_property/of_get_property calls. Rob