Re: [REPORT_ISSUE]: RK3399 pd power down failed
From: elaine.zhang <hidden>
Date: 2021-02-24 03:43:09
Also in:
linux-rockchip
Hi, Rafael: 在 2021/2/24 上午1:09, Rafael J. Wysocki 写道:
quoted hunk ↗ jump to hunk
On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:quoted
On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com [off-list ref] wrote:quoted
Hi, Heiko : In rk3399 evb board, I found a probabilistic problem about PD. Turning off PD occasionally failed. log show: Open the vop #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a close the vop #enter # cat sys/kernel/debug/pm_genpd/pm_genpd_summary domain status slaves /device runtime status ---------------------------------------------------------------------- pd_vopl off pd_vopb on /devices/platform/ff903f00.iommu suspended /devices/platform/ff900000.vop suspended I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately, PD power off may be failed. The process is as follows: VOP requests to power off PD: pm_runtime_put_sync(vop->dev) -> rpm_idle(vop) -> rpm_suspend(vop) -> __update_runtime_status(dev, RPM_SUSPENDING) -> rpm_callback(vop) -> __rpm_callback(vop) -> do power off pd callback(genpd_power_off) -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。 -> Handle link device callbacks according to device link(rpm_put_suppliers) -> pm_runtime_put(link->supplier) -> queue_work(pm_wq, &dev->power.work), execute immediately ->rpm_idle(iommu) -> rpm_suspend(iommu) -> rpm_callback(iommu) -> rk_iommu_suspend -> do power off pd callback(genpd_power_off) -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。 -> iommu do __update_runtime_status(dev, RPM_SUSPENDED) -> vop do __update_runtime_status(dev, RPM_SUSPENDED)So, rpm_suspend() tries to suspend the supplier device link via rpm_put_suppliers(), before it has updated its consumer device's state to RPM_SUSPENDED. This looks worrying to me, both because it's seems wrong to allow a supplier to be suspended before a consumers device's state has reached RPM_SUSPENDED - but also because it's not consistent with the way we treat parent/child devices. The child's state will always be set to RPM_SUSPENDED, before we try to suspend its parent by calling rpm_idle() for it in rpm_suspend(). Rafael, what's your take on this? Would it make sense to align the behavior for consumer/supplier-links in rpm_suspend() according to child/parents?Suspending the suppliers before changing the consumer RPM status to "suspended" is indeed incorrect, which is something I overlooked when writing the code in question. Fortunately, it seems to be relatively easy to address. Please see the appended tentative patch (untested). It also avoids reading runtime_status outside the lock which is arguably fishy. --- drivers/base/power/runtime.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) Index: linux-pm/drivers/base/power/runtime.c ===================================================================--- linux-pm.orig/drivers/base/power/runtime.c +++ linux-pm/drivers/base/power/runtime.c@@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru if (dev->power.irq_safe) { spin_unlock(&dev->power.lock); + } else if (!use_links) { + spin_unlock_irq(&dev->power.lock); } else { + bool get = dev->power.runtime_status == RPM_RESUMING; + spin_unlock_irq(&dev->power.lock); /*@@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru * routine returns, so it is safe to read the status outside of * the lock. */ - if (use_links && dev->power.runtime_status == RPM_RESUMING) { + if (get) { idx = device_links_read_lock(); retval = rpm_get_suppliers(dev);@@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru if (dev->power.irq_safe) { spin_lock(&dev->power.lock); + } if (!use_links) { + spin_lock_irq(&dev->power.lock); } else { + bool put; + + spin_lock_irq(&dev->power.lock); + + put = dev->power.runtime_status == RPM_SUSPENDING && !retval; + if (put) + __update_runtime_status(dev, RPM_SUSPENDED); + else + put = dev->power.runtime_status == RPM_RESUMING && retval; + + spin_unlock_irq(&dev->power.lock); + /* * If the device is suspending and the callback has returned * success, drop the usage counters of the suppliers that have@@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru * * Do that if resume fails too. */ - if (use_links - && ((dev->power.runtime_status == RPM_SUSPENDING && !retval) - || (dev->power.runtime_status == RPM_RESUMING && retval))) { + if (put) { idx = device_links_read_lock(); fail:
Thank you for your reply. I have tested this patch, and it's works well.Perfect solution to our problem. We expect this patch to be committed to the mainline branch.