Thread (24 messages) 24 messages, 6 authors, 2021-07-01

Re: [PATCH 2/2] PM: domain: use per-genpd lockdep class

From: Bjorn Andersson <hidden>
Date: 2021-06-15 16:00:48
Also in: lkml

On Tue 15 Jun 05:17 CDT 2021, Ulf Hansson wrote:
+ Mark

On Fri, 11 Jun 2021 at 16:34, Dmitry Baryshkov
[off-list ref] wrote:
quoted
Added Stephen to Cc list

On Fri, 11 Jun 2021 at 16:50, Ulf Hansson [off-list ref] wrote:
quoted
On Fri, 11 Jun 2021 at 12:15, Dmitry Baryshkov
[off-list ref] wrote:
quoted
In case of nested genpds it is easy to get the following warning from
lockdep, because all genpd's mutexes share same locking class. Use the
per-genpd locking class to stop lockdep from warning about possible
deadlocks. It is not possible to directly use genpd nested locking, as
it is not the genpd code calling genpd. There are interim calls to
regulator core.

[    3.030219] ============================================
[    3.030220] WARNING: possible recursive locking detected
[    3.030221] 5.13.0-rc3-00054-gf8f0a2f2b643-dirty #2480 Not tainted
[    3.030222] --------------------------------------------
[    3.030223] kworker/u16:0/7 is trying to acquire lock:
[    3.030224] ffffde0eabd29aa0 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x2c
[    3.030236]
[    3.030236] but task is already holding lock:
[    3.030236] ffffde0eabcfd6d0 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x2c
[    3.030240]
[    3.030240] other info that might help us debug this:
[    3.030240]  Possible unsafe locking scenario:
[    3.030240]
[    3.030241]        CPU0
[    3.030241]        ----
[    3.030242]   lock(&genpd->mlock);
[    3.030243]   lock(&genpd->mlock);
[    3.030244]
[    3.030244]  *** DEADLOCK ***
[    3.030244]
[    3.030244]  May be due to missing lock nesting notation
[    3.030244]
[    3.030245] 6 locks held by kworker/u16:0/7:
[    3.030246]  #0: ffff6cca00010938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1f0/0x730
[    3.030252]  #1: ffff8000100c3db0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1f0/0x730
[    3.030255]  #2: ffff6cca00ce3188 (&dev->mutex){....}-{3:3}, at: __device_attach+0x3c/0x184
[    3.030260]  #3: ffffde0eabcfd6d0 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x2c
[    3.030264]  #4: ffff8000100c3968 (regulator_ww_class_acquire){+.+.}-{0:0}, at: regulator_lock_dependent+0x6c/0x1b0
[    3.030270]  #5: ffff6cca00a59158 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_lock_recursive+0x94/0x1d0
[    3.030273]
[    3.030273] stack backtrace:
[    3.030275] CPU: 6 PID: 7 Comm: kworker/u16:0 Not tainted 5.13.0-rc3-00054-gf8f0a2f2b643-dirty #2480
[    3.030276] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
[    3.030278] Workqueue: events_unbound deferred_probe_work_func
[    3.030280] Call trace:
[    3.030281]  dump_backtrace+0x0/0x1a0
[    3.030284]  show_stack+0x18/0x24
[    3.030286]  dump_stack+0x108/0x188
[    3.030289]  __lock_acquire+0xa20/0x1e0c
[    3.030292]  lock_acquire.part.0+0xc8/0x320
[    3.030294]  lock_acquire+0x68/0x84
[    3.030296]  __mutex_lock+0xa0/0x4f0
[    3.030299]  mutex_lock_nested+0x40/0x50
[    3.030301]  genpd_lock_mtx+0x18/0x2c
[    3.030303]  dev_pm_genpd_set_performance_state+0x94/0x1a0
[    3.030305]  reg_domain_enable+0x28/0x4c
[    3.030308]  _regulator_do_enable+0x420/0x6b0
[    3.030310]  _regulator_enable+0x178/0x1f0
[    3.030312]  regulator_enable+0x3c/0x80
At a closer look, I am pretty sure that it's the wrong code design
that triggers this problem, rather than that we have a real problem in
genpd. To put it simply, the code in genpd isn't designed to work like
this. We will end up in circular looking paths, leading to deadlocks,
sooner or later if we allow the above code path.

To fix it, the regulator here needs to be converted to a proper PM
domain. This PM domain should be assigned as the parent to the one
that is requested to be powered on.
This more or less resembles original design, replaced per review
request to use separate regulator
(https://lore.kernel.org/linux-arm-msm/160269659638.884498.4031967462806977493@swboyd.mtv.corp.google.com/ (local),
https://lore.kernel.org/linux-arm-msm/20201023131925.334864-1-dmitry.baryshkov@linaro.org/ (local)).
Thanks for the pointers. In hindsight, it looks like the
"regulator-fixed-domain" DT binding wasn't the right thing.

Fortunately, it looks like the problem can be quite easily fixed, by
moving to a correct model of the domain hierarchy.
Can you give some pointers to how we actually fix this?

The problem that lead us down this path is that drivers/clk/qcom/gdsc.c
describes power domains, which are parented by domains provided by
drivers/soc/qcom/rpmhpd.c.

But I am unable to find a way for the gdsc driver to get hold of the
struct generic_pm_domain of the resources exposed by the rpmhpd driver.


The second thing that Dmitry's regulator driver does is to cast the
appropriate performance state vote on the rpmhpd resource, but I _think_
we can do that using OPP tables in the gdsc client's node...
Beyond this, perhaps we should consider removing the
"regulator-fixed-domain" DT property, as to avoid similar problems
from cropping up?
Currently there's a single user upstream, but we have the exact same
problem in at least 3-4 platforms with patches in the pipeline.

In order to avoid breakage with existing DT I would prefer to see
regulator-fixed-domain to live for at least one kernel release beyond
the introduction of the other model.

Regards,
Bjorn
Mark, what do you think?
quoted
Stephen, would it be fine to you to convert the mmcx regulator into
the PM domain?
quoted
quoted
[    3.030314]  gdsc_toggle_logic+0x30/0x124
[    3.030317]  gdsc_enable+0x60/0x290
[    3.030318]  _genpd_power_on+0xc0/0x134
[    3.030320]  genpd_power_on.part.0+0xa4/0x1f0
[    3.030322]  __genpd_dev_pm_attach+0xf4/0x1b0
[    3.030324]  genpd_dev_pm_attach+0x60/0x70
[    3.030326]  dev_pm_domain_attach+0x54/0x5c
[    3.030329]  platform_probe+0x50/0xe0
[    3.030330]  really_probe+0xe4/0x510
[    3.030332]  driver_probe_device+0x64/0xcc
[    3.030333]  __device_attach_driver+0xb8/0x114
[    3.030334]  bus_for_each_drv+0x78/0xd0
[    3.030337]  __device_attach+0xdc/0x184
[    3.030338]  device_initial_probe+0x14/0x20
[    3.030339]  bus_probe_device+0x9c/0xa4
[    3.030340]  deferred_probe_work_func+0x88/0xc4
[    3.030342]  process_one_work+0x298/0x730
[    3.030343]  worker_thread+0x74/0x470
[    3.030344]  kthread+0x168/0x170
[    3.030346]  ret_from_fork+0x10/0x34

Signed-off-by: Dmitry Baryshkov <redacted>
[...]

Kind regards
Uffe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help