Thread (5 messages) 5 messages, 4 authors, 2021-02-24

Re: [REPORT_ISSUE]: RK3399 pd power down failed

From: elaine.zhang <hidden>
Date: 2021-02-24 03:43:09
Also in: linux-rockchip

Hi,  Rafael:

在 2021/2/24 上午1:09, Rafael J. Wysocki 写道:
quoted hunk ↗ jump to hunk
On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
quoted
On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
[off-list ref] wrote:
quoted
Hi, Heiko :

In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.

log show:
Open the vop
#modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a

close the vop
#enter

  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
domain                          status          slaves
     /device                                             runtime status
----------------------------------------------------------------------
pd_vopl                         off
pd_vopb                         on
     /devices/platform/ff903f00.iommu                     suspended
     /devices/platform/ff900000.vop                          suspended

I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
The process is as follows:

VOP requests to power off PD:
pm_runtime_put_sync(vop->dev)
     -> rpm_idle(vop)
         -> rpm_suspend(vop)
             -> __update_runtime_status(dev, RPM_SUSPENDING)
                 -> rpm_callback(vop)
                     -> __rpm_callback(vop)
                         -> do power off pd callback(genpd_power_off)
                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
                                     -> pm_runtime_put(link->supplier)
                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
                                             ->rpm_idle(iommu)
                                                 -> rpm_suspend(iommu)
                                                     -> rpm_callback(iommu)
                                                         -> rk_iommu_suspend
                                                             ->  do power off pd callback(genpd_power_off)
                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
So, rpm_suspend() tries to suspend the supplier device link via
rpm_put_suppliers(), before it has updated its consumer device's state
to RPM_SUSPENDED.

This looks worrying to me, both because it's seems wrong to allow a
supplier to be suspended before a consumers device's state has reached
RPM_SUSPENDED - but also because it's not consistent with the way we
treat parent/child devices. The child's state will always be set to
RPM_SUSPENDED, before we try to suspend its parent by calling
rpm_idle() for it in rpm_suspend().

Rafael, what's your take on this? Would it make sense to align the
behavior for consumer/supplier-links in rpm_suspend() according to
child/parents?
Suspending the suppliers before changing the consumer RPM status to
"suspended" is indeed incorrect, which is something I overlooked when
writing the code in question.

Fortunately, it seems to be relatively easy to address.

Please see the appended tentative patch (untested).  It also avoids reading
runtime_status outside the lock which is arguably fishy.

---
  drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
  1 file changed, 20 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
  
  	if (dev->power.irq_safe) {
  		spin_unlock(&dev->power.lock);
+	} else if (!use_links) {
+		spin_unlock_irq(&dev->power.lock);
  	} else {
+		bool get = dev->power.runtime_status == RPM_RESUMING;
+
  		spin_unlock_irq(&dev->power.lock);
  
  		/*
@@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
  		 * routine returns, so it is safe to read the status outside of
  		 * the lock.
  		 */
-		if (use_links && dev->power.runtime_status == RPM_RESUMING) {
+		if (get) {
  			idx = device_links_read_lock();
  
  			retval = rpm_get_suppliers(dev);
@@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
  
  	if (dev->power.irq_safe) {
  		spin_lock(&dev->power.lock);
+	} if (!use_links) {
+		spin_lock_irq(&dev->power.lock);
  	} else {
+		bool put;
+
+		spin_lock_irq(&dev->power.lock);
+
+		put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
+		if (put)
+			__update_runtime_status(dev, RPM_SUSPENDED);
+		else
+			put = dev->power.runtime_status == RPM_RESUMING && retval;
+
+		spin_unlock_irq(&dev->power.lock);
+
  		/*
  		 * If the device is suspending and the callback has returned
  		 * success, drop the usage counters of the suppliers that have
@@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
  		 *
  		 * Do that if resume fails too.
  		 */
-		if (use_links
-		    && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
-		    || (dev->power.runtime_status == RPM_RESUMING && retval))) {
+		if (put) {
  			idx = device_links_read_lock();
  
   fail:
Thank you for your reply.

I have tested this patch, and it's works well.Perfect solution to our 
problem.

We expect this patch to be committed to the mainline branch.

  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help