Thread (39 messages) 39 messages, 4 authors, 2015-09-15

[PATCH v3 0/18] On-demand device probing

From: Tomeu Vizoso <hidden>
Date: 2015-08-07 06:56:11
Also in: linux-acpi, linux-devicetree, lkml

On 6 August 2015 at 22:14, Rob Herring [off-list ref] wrote:
On Thu, Aug 6, 2015 at 9:11 AM, Tomeu Vizoso [off-list ref] wrote:
quoted
Hello,

I have a problem with the panel on my Tegra Chromebook taking longer
than expected to be ready during boot (St?phane Marchesin reported what
is basically the same issue in [0]), and have looked into ordered
probing as a better way of solving this than moving nodes around in the
DT or playing with initcall levels and linking order.

While reading the thread [1] that Alexander Holler started with his
series to make probing order deterministic, it occurred to me that it
should be possible to achieve the same by probing devices as they are
referenced by other devices.

This basically reuses the information that is already implicit in the
probe() implementations, saving us from refactoring existing drivers or
adding information to DTBs.

During review of v1 of this series Linus Walleij suggested that it
should be the device driver core to make sure that dependencies are
ready before probing a device. I gave this idea a try [2] but Mark Brown
pointed out to the logic duplication between the resource acquisition
and dependency discovery code paths (though I think it's fairly minor).

To address that code duplication I experimented with Arnd's devm_probe
[3] concept of having drivers declare their dependencies instead of
acquiring them during probe, and while it worked [4], I don't think we
end up winning anything when compared to just probing devices on-demand
from resource getters.

One remaining objection is to the "sprinkling" of calls to
fwnode_ensure_device() in the resource getters of each subsystem, but I
think it's the right thing to do given that the storage of resources is
currently subsystem-specific.

We could avoid the above by moving resource storage into the core, but I
don't think there's a compelling case for that.

I have tested this on boards with Tegra, iMX.6, Exynos and OMAP SoCs,
and these patches were enough to eliminate all the deferred probes
(except one in PandaBoard because omap_dma_system doesn't have a
firmware node as of yet).

Have submitted a branch [5] with these patches to kernelci.org and I'm
currently trying to fix all regressions, usually due to code assuming
that devices will be probed in a specific order. Current results [6] are
348 passes, 30 fails and 42 unknowns (linux-next [7] is currently
387/3/23).
This is a bit worrying. If this causes a high number of boot failures,
fixing the errors you can find is not the path forward as we can't
test a lot of platforms (and many people don't look at -next). We may
want to put this behind a kconfig option so that we can easily restore
old behavior it needed. Otherwise, we could have to revert the series.
A Kconfig sounds fine to me. Altogether, I don't think it's that bad
because only these boards are known to have broken because of this
series:

at91-sama5d3_xplained
sama5d35ek

ste-snowball

vexpress-v2p-ca15
vexpress-v2p-ca15
vexpress-v2p-ca15_a7
vexpress-v2p-ca15-tc1
vexpress-v2p-ca9

I assume there's only 3 different bugs to fix there, plus a race in
imx boards that I have only papered over with a delay so far.

The failure rate seems to be so high because each boot is a
combination of board+defconfig and there are duplicated boards in
several labs and many were just offline at that moment.

But I agree that there's no way I can test it on all supported hw, so
a Kconfig that people can quickly switch on to disable the feature
sounds good to me.
Are all the commits before this series fixing boot failures? You can't
do dts updates as the fix or backwards compatibility will be broken.
The gpio-ranges fix for Tegra has a commit that safeguards backwards
compatibility, and the typo in regulator names for ux500 doesn't
really break anything that I can see, I just stumped into it when
trying to blindly fix the boot for ste-snowball (I don't have access
to that hw).
quoted
With this series I get the kernel to output to the panel in 0.5s,
instead of 2.8s.

Regards,

Tomeu

[0] http://lists.freedesktop.org/archives/dri-devel/2014-August/066527.html

[1] https://lkml.org/lkml/2014/5/12/452

[2] https://lkml.org/lkml/2015/6/17/305

[3] http://article.gmane.org/gmane.linux.ports.arm.kernel/277689

[4] https://lkml.org/lkml/2015/7/21/441a

[5] https://git.collabora.com/cgit/user/tomeu/linux.git/log/?h=on-demand-probes-v5

[6] http://kernelci.org/boot/all/job/collabora/kernel/v4.2-rc5-6548-g632b98c83840/

[7] http://kernelci.org/boot/all/job/next/kernel/next-20150806/

Changes in v3:
- Only delay platform devices with OF nodes
- Set and use device_node.platform_dev instead of reversing the logic to
  find the platform device that encloses a device node.
I still want this to be a struct device and not a struct
platform_device and am not convinced it can't be. It can simply be an
optimization of the existing function:
Now I realize what you meant, that makes sense to me.

Thanks,

Tomeu
struct platform_device *of_find_device_by_node(struct device_node *np)
{
  if (node->device && node->device->bus == &platform_bus_type)
    return to_platform_device(node->device);
  return NULL;
}

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help