Thread (10 messages) 10 messages, 4 authors, 2021-03-04

Re: [PATCH] mmc: Try power cycling card if command request times out

From: Marten Lindahl <hidden>
Date: 2021-03-02 07:12:10
Also in: lkml

Hi Adrian!

Thank you for your comments!

On Mon, Mar 01, 2021 at 11:40:03AM +0100, Adrian Hunter wrote:
On 1/03/21 10:50 am, Ulf Hansson wrote:
quoted
+ Adrian

On Tue, 16 Feb 2021 at 23:43, Mårten Lindahl [off-list ref] wrote:
quoted
Sometimes SD cards that has been run for a long time enters a state
where it cannot by itself be recovered, but needs a power cycle to be
operational again. Card status analysis has indicated that the card can
end up in a state where all external commands are ignored by the card
since it is halted by data timeouts.

If the card has been heavily used for a long time it can be weared out,
and should typically be replaced. But on some tests, it shows that the
card can still be functional after a power cycle, but as it requires an
operator to do it, the card can remain in a non-operational state for a
long time until the problem has been observed by the operator.

This patch adds function to power cycle the card in case it does not
respond to a command, and then resend the command if the power cycle
was successful. This procedure will be tested 1 time before giving up,
and resuming host operation as normal.
I assume the context above is all about the ioctl interface?

So, when the card enters this non functional state, have you tried
just reading a block through the regular I/O interface. Does it
trigger a power cycle of the card - and then makes it functional
again?
quoted
Signed-off-by: Mårten Lindahl <redacted>
---
Please note: This might not be the way we want to handle these cases,
but at least it lets us start the discussion. In which cases should the
mmc framework deal with error messages like ETIMEDOUT, and in which
cases should it be handled by userspace?
The mmc framework tries to recover a failed block request
(mmc_blk_mq_rw_recovery) which may end up in a HW reset of the card.
Would it be an idea to act in a similar way when an ioctl times out?
Maybe, it's a good idea to allow the similar reset for ioctls as we do
for regular I/O requests. My concern with this though, is that we
might allow user space to trigger a HW resets a bit too easily - and
that could damage the card.

Did you consider this?
quoted
 drivers/mmc/core/block.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 42e27a298218..d007b2af64d6 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -976,6 +976,7 @@ static inline void mmc_blk_reset_success(struct mmc_blk_data *md, int type)
  */
 static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
 {
+       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
        struct mmc_queue_req *mq_rq;
        struct mmc_card *card = mq->card;
        struct mmc_blk_data *md = mq->blkdata;
@@ -983,7 +984,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
        bool rpmb_ioctl;
        u8 **ext_csd;
        u32 status;
-       int ret;
+       int ret, retry = 1;
        int i;

        mq_rq = req_to_mmc_queue_req(req);
@@ -994,9 +995,24 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
        case MMC_DRV_OP_IOCTL_RPMB:
SD cards do not have RPMB.  Did you mean eMMC?
No, you are right. This action should be excluded from 'case MMC_DRV_OP_IOCTL_RPMB'.
quoted
quoted
                idata = mq_rq->drv_op_data;
                for (i = 0, ret = 0; i < mq_rq->ioc_count; i++) {
+cmd_do:
                        ret = __mmc_blk_ioctl_cmd(card, md, idata[i]);
-                       if (ret)
+                       if (ret == -ETIMEDOUT) {
+                               dev_warn(mmc_dev(card->host),
+                                        "error %d sending command\n", ret);
+cmd_reset:
+                               mmc_blk_reset_success(md, type);
mmc_blk_reset_success() is called upon success, not failure.  The reset will
not be attempted twice in a row, for a given type, without a "success" in
between.
Ok, yes I see. This line and the cmd_reset label should be removed, and if
mmc_blk_reset fails we should break, not retry.

Kind regards
Mårten
quoted
quoted
+                               if (retry--) {
+                                       dev_warn(mmc_dev(card->host),
+                                                "power cycling card\n");
+                                       if (mmc_blk_reset
+                                           (md, card->host, type))
+                                               goto cmd_reset;
+                                       mmc_blk_reset_success(md, type);
+                                       goto cmd_do;
+                               }
                                break;
+                       }
                }
                /* Always switch back to main area after RPMB access */
                if (rpmb_ioctl)
--
2.11.0
Kind regards
Uffe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help