Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving
From: Saravana Kannan <hidden>
Date: 2021-02-12 21:00:44
Also in:
linux-acpi, linux-clk, linux-devicetree, linux-pm, linux-renesas-soc, lkml
On Fri, Feb 12, 2021 at 12:15 AM Geert Uytterhoeven [off-list ref] wrote:
Hi Saravana, On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan [off-list ref] wrote:quoted
On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven [off-list ref] wrote:quoted
1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu). - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device node OF_POPULATED after init") is no longer needed (but already queued for v5.12 anyway)Rob doesn't like the proliferation of OF_POPULATED and we don't need it anymore, so maybe work it out with him? It's a balance between some wasted memory (struct device(s)) vs not proliferating OF_POPULATED.Rob: should it be reverted? For v5.13? I guess other similar "fixes" went in in the mean time.quoted
quoted
- Some devices are reprobed, despite their drivers returning a real error code, and not -EPROBE_DEFER:Sorry, it's not obvious from the logs below where "reprobing" is happening. Can you give more pointers please?My log was indeed not a full log, but just the reprobes happening. I'll send you a full log by private email.quoted
Also, thinking more about this, the only way I could see this happen is: 1. Device fails with error that's not -EPROBE_DEFER 2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER flag) where it's a consumer. 3. The supplier probes and the device gets added to the deferred probe list again. But I can't see how this sequence can happen. Device links are created only when a device is added. And is the supplier isn't added yet, the consumer wouldn't have probed in the first place.The full log doesn't show any evidence of the device being added to a list in between the two probes.quoted
Other than "annoying waste of time" is this causing any other problems?Probably not. But see below.quoted
quoted
- The PCI reprobing leads to a memory leak, for which I've sent a fix "[PATCH] PCI: Fix memory leak in pci_register_io_range()" https://lore.kernel.org/linux-pci/20210202100332.829047-1-geert+renesas@glider.be/ (local)Wrt PCI reprobing, 1. Is this PCI never expected to probe, but it's being reattempted despite the NOT EPROBE_DEFER error? OrThere is no PCIe card present, so the failure is expected. Later it is reprobed, which of course fails again.quoted
2. The PCI was deferred probe when it should have probed and then when it's finally reattemped and it could succeed, we are hitting this mem leak issue?I think the leak has always been there, but it was just exposed by this unneeded reprobe. I don't think a reprobe after that specific error path had ever happened before.quoted
I'm basically trying to distinguish between "this stuff should never be retried" vs "this/it's suppliers got probe deferred with fw_devlink=on vs but didn't get probe deferred with fw_devlink=permissive and that's causing issues"There should not be a probe deferral, as no -EPROBE_DEFER was returned.quoted
quoted
- I2C on R-Car Gen3 does not seem to use DMA, according to /sys/kernel/debug/dmaengine/summary: -dma4chan0 | e66d8000.i2c:tx -dma4chan1 | e66d8000.i2c:rx -dma5chan0 | e6510000.i2c:txI think I need more context on the problem before I can try to fix it. I'm also very unfamiliar with that file. With fw_devlink=permissive, I2C was using DMA? If so, the next step is to see if the I2C relative probe order with DMA is getting changed and if so, why.Yes, I plan to dig deeper to see what really happens...
Try fw_devlink.strict (you'll need IOMMU enabled too). If that fixes it and you also don't see this issue with fw_devlink=permissive, then it means there's probably some unnecessary probe deferral that we should try to avoid. At least, that's my hunch right now. Thanks, Saravana
quoted
quoted
- On R-Mobile A1, I get a BUG and a memory leak: BUG: spinlock bad magic on CPU#0, swapper/1quoted
Hmm... I looked at this in bits and pieces throughout the day. At least spent an hour looking at this. This doesn't make a lot of sense to me. I don't even touch anything in this code path AFAICT. Are modules/kernel mixed up somehow? I need more info before I can help. Does reverting my pm domain change make any difference (assume it boots this far without it).I plan to dig deeper to see what really happens... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds