Re: [PATCH] PM: runtime: Allow rpm_resume() to succeed when runtime PM is disabled
From: "Rafael J. Wysocki" <rafael@kernel.org>
Date: 2021-11-26 18:31:44
Also in:
linux-arm-kernel, lkml
On Fri, Nov 26, 2021 at 7:00 PM Rafael J. Wysocki [off-list ref] wrote:
quoted hunk ↗ jump to hunk
On Friday, November 26, 2021 2:46:02 PM CET Ulf Hansson wrote:quoted
On Fri, 26 Nov 2021 at 14:30, Rafael J. Wysocki [off-list ref] wrote:quoted
On Fri, Nov 26, 2021 at 1:20 PM Ulf Hansson [off-list ref] wrote:quoted
On Mon, 1 Nov 2021 at 10:27, Ulf Hansson [off-list ref] wrote:quoted
On Fri, 29 Oct 2021 at 20:27, Rafael J. Wysocki [off-list ref] wrote:quoted
On Fri, Oct 29, 2021 at 12:20 AM Ulf Hansson [off-list ref] wrote:quoted
On Wed, 27 Oct 2021 at 16:33, Alan Stern [off-list ref] wrote:quoted
On Wed, Oct 27, 2021 at 12:55:43PM +0200, Ulf Hansson wrote:quoted
On Wed, 27 Oct 2021 at 04:02, Alan Stern [off-list ref] wrote:quoted
On Wed, Oct 27, 2021 at 12:26:26AM +0200, Ulf Hansson wrote:quoted
During system suspend, the PM core sets dev->power.is_suspended for the device that is being suspended. This flag is also being used in rpm_resume(), to allow it to succeed by returning 1, assuming that runtime PM has been disabled and the runtime PM status is RPM_ACTIVE, for the device. To make this behaviour a bit more useful, let's drop the check for the dev->power.is_suspended flag in rpm_resume(), as it doesn't really need to be limited to this anyway. Signed-off-by: Ulf Hansson <redacted> --- drivers/base/power/runtime.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index ec94049442b9..fadc278e3a66 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c@@ -742,8 +742,8 @@ static int rpm_resume(struct device *dev, int rpmflags) repeat: if (dev->power.runtime_error) retval = -EINVAL; - else if (dev->power.disable_depth == 1 && dev->power.is_suspended - && dev->power.runtime_status == RPM_ACTIVE) + else if (dev->power.disable_depth > 0 && + dev->power.runtime_status == RPM_ACTIVE)IIRC there was a good reason why the original code checked for disable_depth == 1 rather than > 0. But I don't remember exactly what the reason was. Maybe it had something to do with the fact that during a system sleep __device_suspend_late calls __pm_runtime_disable, and the code was checking that there were no other disables in effect.The check was introduced in the below commit: Commit 6f3c77b040fc Author: Kevin Hilman [off-list ref] Date: Fri Sep 21 22:47:34 2012 +0000 PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE, even when disabled, v2 By reading the commit message it's pretty clear to me that the check was added to cover only one specific use case, during system suspend. That is, that a driver may want to call pm_runtime_get_sync() from a late/noirq callback (when the PM core has disabled runtime PM), to understand whether the device is still powered on and accessible.quoted
This is related to the documented behavior of rpm_resume (it's supposed to fail with -EACCES if the device is disabled for runtime PM, no matter what power state the device is in). That probably is also the explanation for why dev->power.is_suspended gets checked: It's how the code tells whether a system sleep is in progress.Yes, you are certainly correct about the current behaviour. It's there for a reason. On the other hand I would be greatly surprised if this change would cause any issues. Of course, I can't make guarantees, but I am, of course, willing to help to fix problems if those happen. As a matter of fact, I think the current behaviour looks quite inconsistent, as it depends on whether the device is being system suspended. Moreover, for syscore devices (dev->power.syscore is set for them), the PM core doesn't set the "is_suspended" flag. Those can benefit from a common behaviour. Finally, I think the "is_suspended" flag actually needs to be protected by a lock when set by the PM core, as it's being used in two separate execution paths. Although, rather than adding a lock for protection, we can just rely on the "disable_depth" in rpm_resume(). It would be easier and makes the behaviour consistent too.As long as is_suspended isn't _written_ in two separate execution paths, we're probably okay without a lock -- provided the code doesn't mind getting an indefinite result when a read races with a write.Well, indefinite doesn't sound very good to me for these cases, even if it most likely never will happen.quoted
quoted
quoted
So overall, I suspect this change should not be made. But some other improvement (like a nice comment) might be in order. Alan SternThanks for reviewing!You're welcome. Whatever you eventually decide to do should be okay with me. I just wanted to make sure that you understood the deeper issue here and had given it some thought. For example, it may turn out that you can resolve matters simply by updating the documentation.I observed the issue on cpuidle-psci. The devices it operates upon are assigned as syscore devices and these are hooked up to a genpd. A call to pm_runtime_get_sync() can happen even after the PM core has disabled runtime PM in the "late" phase. So the error code is received for these real use-cases. Now, as we currently don't check the return value of pm_runtime_get_sync() in cpuidle-psci, it's not a big deal. But it certainly seems worth fixing in my opinion. Let's see if Rafael has some thoughts around this.Am I thinking correctly that this is mostly about working around the limitations of pm_runtime_force_suspend()?No, this isn't related at all. The cpuidle-psci driver doesn't have PM callbacks, thus using pm_runtime_force_suspend() would not work here.Just wanted to send a ping on this to see if we can come to a conclusion. Or maybe we did? :-) I think in the end, what slightly bothers me, is that the behavior is a bit inconsistent. Although, maybe it's the best we can do.I've been thinking about this and it looks like we can do better, but instead of talking about this I'd rather send a patch.Alright. I was thinking along the lines of make similar changes for rpm_idle|suspend(). That would make the behaviour even more consistent, I think. Perhaps that's what you have in mind? :-)Well, not exactly. The idea is to add another counter (called restrain_depth in the patch) to prevent rpm_resume() from running the callback when that is potentially problematic. With that, it is possible to actually distinguish devices with PM-runtime enabled and it allows the PM-runtime status to be checked when it is still known to be meaningful. It requires quite a few changes, but is rather straightforward, unless I'm missing something. Please see the patch below. I've only checked that it builds on x86-64. --- drivers/base/power/main.c | 18 +++---- drivers/base/power/runtime.c | 105 ++++++++++++++++++++++++++++++++++++------- include/linux/pm.h | 2 include/linux/pm_runtime.h | 2 4 files changed, 101 insertions(+), 26 deletions(-) Index: linux-pm/include/linux/pm.h ===================================================================--- linux-pm.orig/include/linux/pm.h +++ linux-pm/include/linux/pm.h@@ -598,6 +598,7 @@ struct dev_pm_info { atomic_t usage_count; atomic_t child_count; unsigned int disable_depth:3; + unsigned int restrain_depth:3; /* PM core private */ unsigned int idle_notification:1; unsigned int request_pending:1; unsigned int deferred_resume:1;@@ -609,6 +610,7 @@ struct dev_pm_info { unsigned int use_autosuspend:1; unsigned int timer_autosuspends:1; unsigned int memalloc_noio:1; + unsigned int already_suspended:1; /* PM core private */ unsigned int links_count; enum rpm_request request; enum rpm_status runtime_status;Index: linux-pm/include/linux/pm_runtime.h ===================================================================--- linux-pm.orig/include/linux/pm_runtime.h +++ linux-pm/include/linux/pm_runtime.h@@ -46,6 +46,8 @@ extern void pm_runtime_enable(struct dev extern void __pm_runtime_disable(struct device *dev, bool check_resume); extern void pm_runtime_allow(struct device *dev); extern void pm_runtime_forbid(struct device *dev); +extern void pm_runtime_restrain(struct device *dev); +extern void pm_runtime_relinquish(struct device *dev); extern void pm_runtime_no_callbacks(struct device *dev); extern void pm_runtime_irq_safe(struct device *dev); extern void __pm_runtime_use_autosuspend(struct device *dev, bool use);Index: linux-pm/drivers/base/power/runtime.c ===================================================================--- linux-pm.orig/drivers/base/power/runtime.c +++ linux-pm/drivers/base/power/runtime.c@@ -744,11 +744,11 @@ static int rpm_resume(struct device *dev repeat: if (dev->power.runtime_error) retval = -EINVAL; - else if (dev->power.disable_depth == 1 && dev->power.is_suspended - && dev->power.runtime_status == RPM_ACTIVE) - retval = 1; else if (dev->power.disable_depth > 0) retval = -EACCES; + else if (dev->power.restrain_depth > 0) + retval = dev->power.runtime_status == RPM_ACTIVE ? 1 : -EAGAIN; + if (retval) goto out;@@ -1164,9 +1164,9 @@ EXPORT_SYMBOL_GPL(pm_runtime_get_if_acti * @dev: Device to handle. * @status: New runtime PM status of the device. * - * If runtime PM of the device is disabled or its power.runtime_error field is - * different from zero, the status may be changed either to RPM_ACTIVE, or to - * RPM_SUSPENDED, as long as that reflects the actual state of the device. + * If runtime PM of the device is disabled or restrained, or its + * power.runtime_error field is nonzero, the status may be changed either to + * RPM_ACTIVE, or to RPM_SUSPENDED, as long as that reflects its actual state. * However, if the device has a parent and the parent is not active, and the * parent's power.ignore_children flag is unset, the device's status cannot be * set to RPM_ACTIVE, so -EBUSY is returned in that case.@@ -1195,13 +1195,16 @@ int __pm_runtime_set_status(struct devic spin_lock_irq(&dev->power.lock); /* - * Prevent PM-runtime from being enabled for the device or return an - * error if it is enabled already and working. + * Prevent PM-runtime from being used for the device or return an + * error if it is in use already. */ - if (dev->power.runtime_error || dev->power.disable_depth) - dev->power.disable_depth++; - else + if (dev->power.runtime_error || dev->power.disable_depth || + dev->power.restrain_depth) { + pm_runtime_get_noresume(dev); + dev->power.restrain_depth++; + } else { error = -EAGAIN; + } spin_unlock_irq(&dev->power.lock);@@ -1278,7 +1281,7 @@ int __pm_runtime_set_status(struct devic device_links_read_unlock(idx); } - pm_runtime_enable(dev); + pm_runtime_relinquish(dev); return error; }@@ -1513,6 +1516,72 @@ void pm_runtime_allow(struct device *dev EXPORT_SYMBOL_GPL(pm_runtime_allow); /** + * pm_runtime_restrain - Temporarily block runtime PM of a device. + * @dev: Device to handle. + * + * Increase the device's usage count and its restrain_dpeth count. If the + * latter was 0 initially, cancel the runtime PM work for @dev if pending and + * wait for all of the runtime PM operations on it in progress to complete. + * + * After this function has been called, attempts to runtime-suspend @dev will + * fail with -EAGAIN and attempts to runtime-resume it will succeed if its + * runtime PM status is RPM_ACTIVE and will fail with -EAGAIN otherwise. + * + * This function can only be called by the PM core. + */ +void pm_runtime_restrain(struct device *dev) +{ + pm_runtime_get_noresume(dev); + + spin_lock_irq(&dev->power.lock); + + if (dev->power.restrain_depth++ > 0) + goto out; + + if (dev->power.disable_depth > 0) { + dev->power.already_suspended = false; + goto out; + } + + /* Update time accounting before blocking PM-runtime. */ + update_pm_runtime_accounting(dev); + + __pm_runtime_barrier(dev); + + dev->power.already_suspended = pm_runtime_status_suspended(dev); + +out: + spin_unlock_irq(&dev->power.lock); +} + +/** + * pm_runtime_relinquish - Unblock runtime PM of a device. + * @dev: Device to handle. + * + * Decrease the device's usage count and its restrain_dpeth count. + * + * This function can only be called by the PM core. + */ +void pm_runtime_relinquish(struct device *dev) +{ + spin_lock_irq(&dev->power.lock); + + if (dev->power.restrain_depth > 0) { + dev->power.restrain_depth--; + + /* About to unbolck runtime PM, set accounting_timestamp to now */ + if (!dev->power.restrain_depth && !dev->power.disable_depth) + dev->power.accounting_timestamp = ktime_get_mono_fast_ns(); + } else { + dev_warn(dev, "Unbalanced %s!\n", __func__); + } + + spin_unlock_irq(&dev->power.lock); + + pm_runtime_put_noidle(dev); +} + +/** * pm_runtime_no_callbacks - Ignore runtime PM callbacks for a device. * @dev: Device to handle. *@@ -1806,8 +1875,10 @@ int pm_runtime_force_suspend(struct devi int (*callback)(struct device *); int ret; - pm_runtime_disable(dev); - if (pm_runtime_status_suspended(dev)) + pm_runtime_restrain(dev); + + /* No suspend if the device has already been suspended by PM-runtime. */ + if (!dev->power.already_suspended)
I got the check here the other way around, sorry.
quoted hunk ↗ jump to hunk
return 0; callback = RPM_GET_CALLBACK(dev, runtime_suspend);@@ -1832,7 +1903,7 @@ int pm_runtime_force_suspend(struct devi return 0; err: - pm_runtime_enable(dev); + pm_runtime_relinquish(dev); return ret; } EXPORT_SYMBOL_GPL(pm_runtime_force_suspend);@@ -1854,7 +1925,7 @@ int pm_runtime_force_resume(struct devic int (*callback)(struct device *); int ret = 0; - if (!pm_runtime_status_suspended(dev) || !dev->power.needs_force_resume) + if (!dev->power.already_suspended || !dev->power.needs_force_resume)
And here I probably should leave the original check the way it is.
quoted hunk ↗ jump to hunk
goto out; /*@@ -1874,7 +1945,7 @@ int pm_runtime_force_resume(struct devic pm_runtime_mark_last_busy(dev); out: dev->power.needs_force_resume = 0; - pm_runtime_enable(dev); + pm_runtime_relinquish(dev); return ret; } EXPORT_SYMBOL_GPL(pm_runtime_force_resume);Index: linux-pm/drivers/base/power/main.c ===================================================================--- linux-pm.orig/drivers/base/power/main.c +++ linux-pm/drivers/base/power/main.c@@ -809,7 +809,7 @@ Skip: Out: TRACE_RESUME(error); - pm_runtime_enable(dev); + pm_runtime_relinquish(dev); complete_all(&dev->power.completion); return error; }@@ -907,8 +907,8 @@ static int device_resume(struct device * goto Complete; if (dev->power.direct_complete) { - /* Match the pm_runtime_disable() in __device_suspend(). */ - pm_runtime_enable(dev); + /* Match the pm_runtime_restrict() in __device_suspend(). */ + pm_runtime_relinquish(dev); goto Complete; }@@ -1392,7 +1392,7 @@ static int __device_suspend_late(struct TRACE_DEVICE(dev); TRACE_SUSPEND(0); - __pm_runtime_disable(dev, false); + pm_runtime_restrain(dev); dpm_wait_for_subordinate(dev, async);@@ -1627,9 +1627,9 @@ static int __device_suspend(struct devic * callbacks for it. * * If the system-wide suspend callbacks below change the configuration - * of the device, they must disable runtime PM for it or otherwise - * ensure that its runtime-resume callbacks will not be confused by that - * change in case they are invoked going forward. + * of the device, they must ensure that its runtime-resume callbacks + * will not be confused by that change in case they are invoked going + * forward. */ pm_runtime_barrier(dev);@@ -1648,13 +1648,13 @@ static int __device_suspend(struct devic if (dev->power.direct_complete) { if (pm_runtime_status_suspended(dev)) { - pm_runtime_disable(dev); + pm_runtime_restrain(dev); if (pm_runtime_status_suspended(dev)) { pm_dev_dbg(dev, state, "direct-complete "); goto Complete; } - pm_runtime_enable(dev); + pm_runtime_relinquish(dev); } dev->power.direct_complete = false; }