Thread (27 messages) 27 messages, 6 authors, 2018-09-14

Re: [PATCH v4 07/10] block, scsi: Rework runtime power management

From: Bart Van Assche <hidden>
Date: 2018-08-06 15:20:11

On Sat, 2018-08-04 at 18:01 +0800, Ming Lei wrote:
On Sat, Aug 4, 2018 at 8:03 AM, Bart Van Assche <bart.vanassche�
EA-wdc.com> wrote:
quoted
=20
diff --git a/block/blk-pm.c b/block/blk-pm.c
index 2a4632d0be4b..0708185ead61 100644
--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -107,17 +107,31 @@ int blk_pre_runtime�
AF8-suspend(struct request_queue *q)
quoted
        if (!q->dev)
                return ret;
=20
+       WARN_ON_ONCE(q->rpm_status != RP=
M_ACTIVE);
quoted
+
        blk_pm_runtime_lock(q);
+       /*
+        * Switch to preempt-only mode before calling sync=
hronize_rcu() such
quoted
+        * that later blk_queue_enter() calls see =
the preempt-only state. See
quoted
+        * also http://lwn.net/Articles/573497/.
+        */
+       blk_set_pm_only(q);
+       ret = -EBUSY;
+       if (percpu_ref_read(&q->q_usage�
8-counter) == 1) {
quoted
+               synchronize_rcu();
+               if (percpu_ref_read(&q->q_=
usage_counter) == 1)
quoted
+                       ret = 0;
+       }
=20
The above code looks too tricky, and the following issue might exist =
in case
of potential WRITEs re-order from blk_queue_enter().
=20
1) suppose there are one in-flight request not completed yet
=20
2) observer is the percpu_ref_read() following synchronize�
8-rcu().
=20
3) inside blk_queue_enter(), WRITE in percpu_ref_tryg=
et_live()/percpu_ref_put()
can be re-ordered from view of the observer in 2)
=20
4) then percpu_ref_read() may return 1 even though the only i=
n-flight
request isn't completed
=20
5) suspend still moves on, and data loss may be triggered
=20
Cc Paul, and Alan have been CCed already, so we have several memory
model experts here, :-), hope they may help to clarify if this reorde=
r case
may exist.
Hello Ming,

There are two cases:
(a) The percpu_ref_tryget_live() call in (3) is triggered by a =
runtime power
    state transition (RPM_*).
(b) The percpu_ref_tryget_live() call in (3) is not triggered b=
y a runtime
    power state transition.

In case (b) the driver that submits the request should protect the request
with scsi_autopm_get_device() / scsi_autopm_put_dev=
ice() or by any other
mechanism that prevents runtime power state transitions. Since
scsi_autopm_get_device() is called before blk_queue_ent=
er(), since
scsi_autopm_get_device() only returns after it has resumed a de=
vice and since
scsi_autopm_get_device() prevents runtime suspend, the function
blk_pre_runtime_suspend() won't be called concurrently with blk=
_queue_enter()
in case (b).

Since the power management serializes power state transitions also for case
(a) the race described above won't happen. blk_pre_runtime_susp=
end() will have
finished before any RQF_PM requests are submitted.

Bart.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help