Thread (27 messages) 27 messages, 4 authors, 2010-09-29
STALE5727d

[PATCH] mmc: failure of block read wait for long time

From: Adrian Hunter <hidden>
Date: 2010-09-10 13:02:21
Also in: linux-mmc

Ghorai, Sukumar wrote:
quoted
-----Original Message-----
From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
Sent: Friday, September 10, 2010 5:14 PM
To: Ghorai, Sukumar
Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH] mmc: failure of block read wait for long time

Ghorai, Sukumar wrote:
quoted
quoted
-----Original Message-----
From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
Sent: Wednesday, July 28, 2010 2:11 PM
To: Ghorai, Sukumar
Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH] mmc: failure of block read wait for long time

Ghorai, Sukumar wrote:
quoted
quoted
-----Original Message-----
From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
Sent: Tuesday, July 27, 2010 6:52 PM
To: Ghorai, Sukumar
Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
Subject: Re: [PATCH] mmc: failure of block read wait for long time

Sukumar Ghorai wrote:
quoted
 multi-block read failure retries in single block read one by one.
It
quoted
quoted
quoted
quoted
continues
quoted
 retry of subsequent blocks, even after failure. Application will
not
quoted
quoted
be
quoted
quoted
able
quoted
 to decode the interleave data (even if few single block read
success).
quoted
quoted
quoted
quoted
quoted
 This patch fixes this problem by returning at the first failure
instead
quoted
quoted
of
quoted
 waiting for long duration.
How can you know what sectors are important to the upper layers?
[Ghorai]
1. This is obvious either give the complete data to above layer or not.
And every protocol follows that.
quoted
2. And other scenario, say apps request for to read the 128MB data and
it will take quite long time before returning in failure case.
quoted
3. it is quite obvious if the read fail for sector(x) it will fail for
sector(x+1)
None of that is exactly true.  However you could change the retry
from sectors to segments e.g.

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index d545f79..5ed15f6 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
*mq,
quoted
quoted
struct request *req)
 	struct mmc_blk_data *md = mq->data;
 	struct mmc_card *card = md->queue.card;
 	struct mmc_blk_request brq;
-	int ret = 1, disable_multi = 0;
+	int ret = 1, retrying = 0;

 	mmc_claim_host(card->host);

 	do {
 		struct mmc_command cmd;
 		u32 readcmd, writecmd, status = 0;
+		unsigned int blocks;

 		memset(&brq, 0, sizeof(struct mmc_blk_request));
 		brq.mrq.cmd = &brq.cmd;
@@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
*mq,
quoted
quoted
struct request *req)
 		brq.stop.opcode = MMC_STOP_TRANSMISSION;
 		brq.stop.arg = 0;
 		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
-		brq.data.blocks = blk_rq_sectors(req);
+
+		/*
+		 * After a read error, we redo the request one segment at a
time
+		 * in order to accurately determine which segments can be read
+		 * successfully.
+		 */
+		if (retrying)
+			blocks = blk_rq_cur_sectors(req);
+		else
+			blocks = blk_rq_sectors(req);

 		/*
 		 * The block layer doesn't support all sector count
 		 * restrictions, so we need to be prepared for too big
 		 * requests.
 		 */
-		if (brq.data.blocks > card->host->max_blk_count)
-			brq.data.blocks = card->host->max_blk_count;
+		if (blocks > card->host->max_blk_count)
+			blocks = card->host->max_blk_count;

-		/*
-		 * After a read error, we redo the request one sector at a
time
-		 * in order to accurately determine which sectors can be read
-		 * successfully.
-		 */
-		if (disable_multi && brq.data.blocks > 1)
-			brq.data.blocks = 1;
+		brq.data.blocks = blocks;

 		if (brq.data.blocks > 1) {
 			/* SPI multiblock writes terminate using a special
@@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
*mq,
quoted
quoted
struct request *req)
 		 * programming mode even when things go wrong.
 		 */
 		if (brq.cmd.error || brq.data.error || brq.stop.error) {
-			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
-				/* Redo read one sector at a time */
-				printk(KERN_WARNING "%s: retrying using single "
-				       "block read\n", req->rq_disk->disk_name);
-				disable_multi = 1;
+			if (!retrying && rq_data_dir(req) == READ &&
+			    blk_rq_cur_sectors(req) &&
+			    blk_rq_cur_sectors(req) < blocks) {
+				/* Redo read one segment at a time */
+				printk(KERN_WARNING "%s: retrying read one "
+						    "segment at a time\n",
+						    req->rq_disk->disk_name);
+				retrying = 1;
 				continue;
 			}
 			status = get_card_status(card, req);
@@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
*mq,
quoted
quoted
struct request *req)
 		if (brq.cmd.error || brq.stop.error || brq.data.error) {
 			if (rq_data_dir(req) == READ) {
 				/*
-				 * After an error, we redo I/O one sector at a
+				 * After an error, we redo I/O one segment at a
 				 * time, so we only reach here after trying to
-				 * read a single sector.
+				 * read a single segment.  Error out the whole
+				 * segment and continue to the next.
 				 */
 				spin_lock_irq(&md->lock);
-				ret = __blk_end_request(req, -EIO,
brq.data.blksz);
+				ret = __blk_end_request(req, -EIO,
blk_rq_cur_sectors(req) << 9);
 				spin_unlock_irq(&md->lock);
 				continue;
 			}

quoted
quoted
quoted
Signed-off-by: Sukumar Ghorai <redacted>
---
 drivers/mmc/card/block.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index cb9fbc8..cfb0827 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
struct request *req)
quoted
 				spin_lock_irq(&md->lock);
 				ret = __blk_end_request(req, -EIO,
brq.data.blksz);
quoted
 				spin_unlock_irq(&md->lock);
-				continue;
 			}
 			goto cmd_err;
 		}
--
[Ghorai] Adrian,
Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
data read) form the original code;
quoted
Initially it was retrying for each page(512 bytes) after multi-block
read fail; but this solution is retying for each segment(2048 bytes);
quoted
1. Now say block layrer reading 1MB and failed for the 1st segment. So
it will still retry for 1MB/2048-bytes, i.e. 512 times.
quoted
2. So do you think any good reason to retry again and again?
If you have 1MB that is not readable, it sounds like the card is broken.
Why are so many reads failing?  Has the card been removed?

You might very rarely see ECC errors in a small number of sectors,
but more than that sounds like something else is broken.
[Ghorai] yes, one example is we remove the card when reading data, 
Well, that is a different case.  Once the card has gone, the block driver
can (and will once the remove method is called) error out all I/O
requests without sending them to MMC.  That doesn't happen until there
is a card detect interrupt and a resulting rescan.

A possible solution is to put a flag on mmc_card to indicate card_gone
that gets set as soon as the drivers card detect interrupt shows there
is no card (hoping that we are not getting any bouncing on card detect)
and then have mmc_wait_for_req() simple return -ENODEV immediately if
the card_gone flag is set.  Finally, if the mmc block driver sees
a -ENODEV error, it should also check the card_gone flag (via a new
core function) and if the card is gone, do not retry - and perhaps
even error out the rest of the I/O request queue as well.

I can suggest a patch if you want but I am on vacation next week so
it will have to wait a couple of weeks.
And moreover we should not give the interleave data to apps, as we don't have option to tell application for the valid data.
quoted
quoted
3. And again I mentioned how to inform the blok-layer that which segment
having the valid data?
quoted
And I was suggesting for not to retry again if 1st retry (single block)
failed.
quoted
http://comments.gmane.org/gmane.linux.kernel.mmc/2714
quoted
quoted
quoted
quoted
To unsubscribe from this list: send the line "unsubscribe linux-mmc"
in
quoted
quoted
quoted
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Regards,
Ghorai
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help