Re: PM regression with commit 5de85b9d57ab PM runtime re-init in v4.5-rc1
From: Ulf Hansson <hidden>
Date: 2016-02-01 16:44:01
Also in:
linux-arm-kernel, linux-omap
On 28 January 2016 at 17:58, Tony Lindgren [off-list ref] wrote:
Hi, * Ulf Hansson [off-list ref] [160128 06:30]:quoted
On 27 January 2016 at 00:52, Tony Lindgren [off-list ref] wrote:quoted
* Rafael J. Wysocki [off-list ref] [160126 15:38]:quoted
On Wed, Jan 27, 2016 at 12:22 AM, Tony Lindgren [off-list ref] wrote:quoted
* Rafael J. Wysocki [off-list ref] [160126 15:15]:quoted
On Tue, Jan 26, 2016 at 11:48 PM, Tony Lindgren [off-list ref] wrote:quoted
Hi, Looks like commit 5de85b9d57ab ("PM / runtime: Re-init runtime PM states at probe error and driver unbind") broke PM on at leastI need to understand the issue here in a bit more detail, could you please try to fill out some of my gaps!? In *what way* did it break PM?The MMC hardware will not get idled properly any longer blocking any deeper idle states.quoted
Did the driver not probe successfully the second try? If so, what happened.It probes fine after a -EPROBE_DEFER on the vmmc i2c regulator. But the PM runtime usecounts are wrong.
Okay. How did you verify this? I think the easiest way would be to make sure the usage count is *one*, just before the omap_hsmmc calls mmc_add_host() in its ->probe() function. If it's *two*, that confirms this theory.
quoted
quoted
quoted
quoted
quoted
quoted
omap3. It seems we now need to additionally also call pm_runtime_dont_use_autosuspend() to get things working again? The following fixes idling on omap3, but I'm wondering if we should do something in pm_runtime_reinit() instead?I understand this as the second (or third, forth, whatever) probing attempt actually succeeds, right!?Right.quoted
Is the issue you are seeing, that the device never becomes runtime suspended due to commit 5de85b9d57ab ("PM / runtime: Re-init runtime PM states at probe error and driver unbind")?Correct.quoted
quoted
quoted
quoted
Looks like we have RPM_ACTIVE set in pm_runtime_reinit() if that gives any clues.That's a good clue. Although, there could be several reasons to why. Rafael has pointed out one valid potential case, but I want to be sure that's really what happening here. *If* the problem is that the device doesn't becomes runtime suspended, that will anyway be prevented as long as the autosuspend_delay has been set to a negative value. That's why I wonder whether this really is the case here.That seems to be the case here. In device driver probe, commenting out pm_runtime_set_autosuspend_delay(host->dev, MMC_AUTOSUSPEND_DELAY) also makes things work again.
Okay. Interesting. :-)
So one of the failing cases seems to be: 1. Device driver probe sets pm_runtime_set_autosuspend_delay()
Only when "someone" in-before has set the autosuspend delay to a
negative value, this will affect the usage count.
I didn't find any in-kernel user that would set this for the
omap_hsmmc device, so it needs to be done from userspace via sysfs. It
would be really great if you could help to confirm this.
But more importantly, I fail to understand why commit 5de85b9d57ab
("PM / runtime: Re-init runtime PM states at probe error and driver
unbind"), is changing the behaviour in this regards.
Potentially we might have a different runtime PM status than we had
before when probing the second try, but how could that mess up the
usage count...?
2. Device driver probe initially fails with -EPROBE_DEFER
So that happened before as well. Although, we now get a different runtime PM status (suspended) the second probe time.
3. PM runtime usecounts get messed up
Perhaps, but let's try to validate that as it should be rather easy.
quoted
For omap3, I assume there's a PM domain (the so called hwmod) being attached to the omap_hsmmc device at device registration point!?Correct. It uses the notifiers like bus code does in general.
Yes, there perfectly okay. Although, I wondering whether it could be that it's the PM domain that's preventing the omap_hsmmc device from becoming runtime suspended. Perhaps the PM domain returns an error code from its ->runtime_suspend() callback?
FYI, most SoCs don't have hardware based autoidle signaling between the interconnect and the interconnect targets. So the hwmod code is still needed until we have converted it into a proper interconnect + module target drivers.quoted
In that path depending on a specific hwmod flag, the device will be given the an initial *active* runtime PM status, via invoking pm_runtime_set_active(). *If* that's the case, it affects the probing sequence, as the pm_runtime_get_sync() won't trigger the ->runtime_resume() callbacks to be invoked in the first probe attempt.It has worked since pm runtime. And it works with MMC as as a loadable module just fine when no -EPROBE_DEFER happens.
Okay, so we definitely seem to have an issue with the changed runtime
PM status (suspended) the second probe time.
That's indeed caused by 5de85b9d57ab ("PM / runtime: Re-init runtime
PM states at probe error and driver unbind").
quoted
Moreover, according to the data I received in this regression report so far, it seems like probing the second time has *always* been done with the device in runtime PM active state. Could that be the reason to why it "happens" to work?Not correct. I think that speculation is not related to the $subject regression at all.
I think it's important in this context, as it could affect how the PM
domain may treat the device. As I mentioned earlier.
For example due to commit 5de85b9d57ab ("PM / runtime: Re-init runtime
PM states at probe error and driver unbind"), the ->runtime_resume()
callback seems now to be called twice in a row, without in-between
having the ->runtime_suspend() callback being invoked. Does really the
PM domain cope with that correctly?
BTW, do you have some hardware to test with that has PM runtime implemnted and actually working with mainline kernel?
Oh, yes. Although I don't have an omap3, I wish I had. Also, I have locally a "runtime PM test driver", which helps me to test various runtime PM sequences. Kind regards Uffe