Thread (54 messages) 54 messages, 10 authors, 2021-04-21

Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

From: Saravana Kannan <hidden>
Date: 2021-02-12 21:00:44
Also in: linux-acpi, linux-clk, linux-devicetree, linux-pm, linux-renesas-soc, lkml

On Fri, Feb 12, 2021 at 12:15 AM Geert Uytterhoeven
[off-list ref] wrote:
Hi Saravana,

On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan [off-list ref] wrote:
quoted
On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven [off-list ref] wrote:
quoted
  1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).

      - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
        node OF_POPULATED after init") is no longer needed (but already
        queued for v5.12 anyway)
Rob doesn't like the proliferation of OF_POPULATED and we don't need
it anymore, so maybe work it out with him? It's a balance between some
wasted memory (struct device(s)) vs not proliferating OF_POPULATED.
Rob: should it be reverted?  For v5.13?
I guess other similar "fixes" went in in the mean time.
quoted
quoted
      - Some devices are reprobed, despite their drivers returning
        a real error code, and not -EPROBE_DEFER:
Sorry, it's not obvious from the logs below where "reprobing" is
happening. Can you give more pointers please?
My log was indeed not a full log, but just the reprobes happening.
I'll send you a full log by private email.
quoted
Also, thinking more about this, the only way I could see this happen is:
1. Device fails with error that's not -EPROBE_DEFER
2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER
flag) where it's a consumer.
3. The supplier probes and the device gets added to the deferred probe
list again.

But I can't see how this sequence can happen. Device links are created
only when a device is added. And is the supplier isn't added yet, the
consumer wouldn't have probed in the first place.
The full log doesn't show any evidence of the device being added
to a list in between the two probes.
quoted
Other than "annoying waste of time" is this causing any other problems?
Probably not.  But see below.
quoted
quoted
      - The PCI reprobing leads to a memory leak, for which I've sent a fix
        "[PATCH] PCI: Fix memory leak in pci_register_io_range()"
        https://lore.kernel.org/linux-pci/20210202100332.829047-1-geert+renesas@glider.be/ (local)
Wrt PCI reprobing,
1. Is this PCI never expected to probe, but it's being reattempted
despite the NOT EPROBE_DEFER error? Or
There is no PCIe card present, so the failure is expected.
Later it is reprobed, which of course fails again.
quoted
2. The PCI was deferred probe when it should have probed and then when
it's finally reattemped and it could succeed, we are hitting this mem
leak issue?
I think the leak has always been there, but it was just exposed by
this unneeded reprobe.  I don't think a reprobe after that specific
error path had ever happened before.
quoted
I'm basically trying to distinguish between "this stuff should never
be retried" vs "this/it's suppliers got probe deferred with
fw_devlink=on vs but didn't get probe deferred with
fw_devlink=permissive and that's causing issues"
There should not be a probe deferral, as no -EPROBE_DEFER was
returned.
quoted
quoted
      - I2C on R-Car Gen3 does not seem to use DMA, according to
        /sys/kernel/debug/dmaengine/summary:

            -dma4chan0    | e66d8000.i2c:tx
            -dma4chan1    | e66d8000.i2c:rx
            -dma5chan0    | e6510000.i2c:tx
I think I need more context on the problem before I can try to fix it.
I'm also very unfamiliar with that file. With fw_devlink=permissive,
I2C was using DMA? If so, the next step is to see if the I2C relative
probe order with DMA is getting changed and if so, why.
Yes, I plan to dig deeper to see what really happens...
Try fw_devlink.strict (you'll need IOMMU enabled too). If that fixes
it and you also don't see this issue with fw_devlink=permissive, then
it means there's probably some unnecessary probe deferral that we
should try to avoid. At least, that's my hunch right now.

Thanks,
Saravana
quoted
quoted
      - On R-Mobile A1, I get a BUG and a memory leak:

            BUG: spinlock bad magic on CPU#0, swapper/1
quoted
Hmm... I looked at this in bits and pieces throughout the day. At
least spent an hour looking at this. This doesn't make a lot of sense
to me. I don't even touch anything in this code path AFAICT.  Are
modules/kernel mixed up somehow? I need more info before I can help.
Does reverting my pm domain change make any difference (assume it
boots this far without it).
I plan to dig deeper to see what really happens...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help