Thread (5 messages) 5 messages, 4 authors, 2021-02-24

Re: [REPORT_ISSUE]: RK3399 pd power down failed

From: "Rafael J. Wysocki" <rafael@kernel.org>
Date: 2021-02-24 12:50:59
Also in: linux-rockchip

On Wed, Feb 24, 2021 at 10:46 AM Ulf Hansson [off-list ref] wrote:
On Tue, 23 Feb 2021 at 18:09, Rafael J. Wysocki [off-list ref] wrote:
quoted
On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
quoted
On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
[off-list ref] wrote:
quoted
Hi, Heiko :

In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.

log show:
Open the vop
#modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a

close the vop
#enter

 # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
domain                          status          slaves
    /device                                             runtime status
----------------------------------------------------------------------
pd_vopl                         off
pd_vopb                         on
    /devices/platform/ff903f00.iommu                     suspended
    /devices/platform/ff900000.vop                          suspended

I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
The process is as follows:

VOP requests to power off PD:
pm_runtime_put_sync(vop->dev)
    -> rpm_idle(vop)
        -> rpm_suspend(vop)
            -> __update_runtime_status(dev, RPM_SUSPENDING)
                -> rpm_callback(vop)
                    -> __rpm_callback(vop)
                        -> do power off pd callback(genpd_power_off)
                            -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
                                -> Handle link device callbacks according to device link(rpm_put_suppliers)
                                    -> pm_runtime_put(link->supplier)
                                        -> queue_work(pm_wq, &dev->power.work), execute immediately
                                            ->rpm_idle(iommu)
                                                -> rpm_suspend(iommu)
                                                    -> rpm_callback(iommu)
                                                        -> rk_iommu_suspend
                                                            ->  do power off pd callback(genpd_power_off)
                                                                -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
                                                                    -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
                                                                        -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
So, rpm_suspend() tries to suspend the supplier device link via
rpm_put_suppliers(), before it has updated its consumer device's state
to RPM_SUSPENDED.

This looks worrying to me, both because it's seems wrong to allow a
supplier to be suspended before a consumers device's state has reached
RPM_SUSPENDED - but also because it's not consistent with the way we
treat parent/child devices. The child's state will always be set to
RPM_SUSPENDED, before we try to suspend its parent by calling
rpm_idle() for it in rpm_suspend().

Rafael, what's your take on this? Would it make sense to align the
behavior for consumer/supplier-links in rpm_suspend() according to
child/parents?
Suspending the suppliers before changing the consumer RPM status to
"suspended" is indeed incorrect, which is something I overlooked when
writing the code in question.

Fortunately, it seems to be relatively easy to address.

Please see the appended tentative patch (untested).  It also avoids reading
runtime_status outside the lock which is arguably fishy.
Great, thanks for your quick reply!

A minor comment on the below change, but otherwise feel free add my
reviewed-by tag.
Thanks!

I'm going to submit an updated patch that avoids some unnecessary
locking overhead.
quoted
---
 drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru

        if (dev->power.irq_safe) {
                spin_unlock(&dev->power.lock);
+       } else if (!use_links) {
+               spin_unlock_irq(&dev->power.lock);
        } else {
+               bool get = dev->power.runtime_status == RPM_RESUMING;
+
                spin_unlock_irq(&dev->power.lock);

                /*
@@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
                 * routine returns, so it is safe to read the status outside of
                 * the lock.
                 */
-               if (use_links && dev->power.runtime_status == RPM_RESUMING) {
+               if (get) {
                        idx = device_links_read_lock();

                        retval = rpm_get_suppliers(dev);
@@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru

        if (dev->power.irq_safe) {
                spin_lock(&dev->power.lock);
+       } if (!use_links) {
This should be an "else if", I think.
Yes, it should, thanks!
quoted
+               spin_lock_irq(&dev->power.lock);
        } else {
+               bool put;
+
+               spin_lock_irq(&dev->power.lock);
+
+               put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
+               if (put)
+                       __update_runtime_status(dev, RPM_SUSPENDED);
+               else
+                       put = dev->power.runtime_status == RPM_RESUMING && retval;
+
+               spin_unlock_irq(&dev->power.lock);
+
                /*
                 * If the device is suspending and the callback has returned
                 * success, drop the usage counters of the suppliers that have
@@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
                 *
                 * Do that if resume fails too.
                 */
-               if (use_links
-                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
-                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {
+               if (put) {
                        idx = device_links_read_lock();

  fail:

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help