Thread (27 messages) 27 messages, 4 authors, 2010-09-29
STALE5729d

[PATCH] mmc: failure of block read wait for long time

From: Ghorai, Sukumar <hidden>
Date: 2010-09-20 12:37:24
Also in: linux-mmc

-----Original Message-----
From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
Sent: Monday, September 20, 2010 5:20 PM
To: Ghorai, Sukumar
Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
Adrian Hunter
Subject: Re: [PATCH] mmc: failure of block read wait for long time

On 20/09/10 11:57, Ghorai, Sukumar wrote:
quoted
Adrian,
quoted
-----Original Message-----
From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
Sent: Monday, September 20, 2010 1:24 PM
To: Ghorai, Sukumar
Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
Adrian Hunter
Subject: Re: [PATCH] mmc: failure of block read wait for long time

On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
quoted
Adrian,

[..snip..]
quoted
quoted
quoted
quoted
[Ghorai] Adrian,
Yes this works and reduced the retry by 1/4 (2048 to 512 times for
1MB
quoted
quoted
quoted
quoted
data read) form the original code;
quoted
Initially it was retrying for each page(512 bytes) after multi-
block
quoted
quoted
quoted
quoted
quoted
quoted
read fail; but this solution is retying for each segment(2048
bytes);
quoted
quoted
quoted
quoted
quoted
quoted
quoted
1. Now say block layrer reading 1MB and failed for the 1st segment.
So
quoted
quoted
quoted
quoted
it will still retry for 1MB/2048-bytes, i.e. 512 times.
quoted
2. So do you think any good reason to retry again and again?
If you have 1MB that is not readable, it sounds like the card is
broken.
quoted
quoted
quoted
quoted
Why are so many reads failing?  Has the card been removed?

You might very rarely see ECC errors in a small number of sectors,
but more than that sounds like something else is broken.
[Ghorai] yes, one example is we remove the card when reading data,
Well, that is a different case.  Once the card has gone, the block
driver
quoted
quoted
can (and will once the remove method is called) error out all I/O
requests without sending them to MMC.  That doesn't happen until
there
quoted
quoted
quoted
quoted
is a card detect interrupt and a resulting rescan.
[Ghorai] here we are discussing two problem,
1. If IO failed how to stop retry; because of -
	a. internal card error
	b. issue in Filesystem, driver, or host controller issue
	c. or cards removed.

2. And 2nd how to sync block-layer IO, if card removed,
	a. case 1: when card removed interrupt support by the platform
	b. case 2: when card removed interrupt does not support by the
platform?
quoted
quoted
A possible solution is to put a flag on mmc_card to indicate
card_gone
quoted
quoted
quoted
quoted
that gets set as soon as the drivers card detect interrupt shows
there
quoted
quoted
quoted
quoted
is no card (hoping that we are not getting any bouncing on card
detect)
quoted
quoted
quoted
quoted
and then have mmc_wait_for_req() simple return -ENODEV immediately if
the card_gone flag is set.  Finally, if the mmc block driver sees
a -ENODEV error, it should also check the card_gone flag (via a new
core function) and if the card is gone, do not retry - and perhaps
even error out the rest of the I/O request queue as well.
[Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
The card removal case can be extended to use the bus ops detect method
when there is no card detect irq.  I will send a RFC patch.

With respect to 1.a:
   - If the card has an internal error, then it is broken.  The user
   should remove the card and use a better one.  I do not see how
reducing
quoted
quoted
   retry delays really helps the user very much.  Arguably if the card
   becomes unresponsive, the MMC core could provide a facility to
   reinitialise the card, but that is yet another issue.

With respect to 1.b:
   - The file system cannot cause the block driver to have I/O errors.
   - If there are errors in the driver they should be fixed.
   - If there are hardware problems with the host controller, then
   it is up to the host controller driver to deal with them e.g.
   by resetting the controller.  I don't see what this has to do with
   the block driver.

You leave out the important case of ECC errors.  I am concerned about
this because of the possibility that it happens inside a file system
journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
error only affects the last transaction, but perhaps not if it destroys
other transactions - which could happen if the approach you suggest
is taken.
[Ghorai] Thanks lot for your descriptive answer.
1. Can you answer this? 2.b. case 2: when card removed interrupt does
not support by the platform?

As I wrote above: The card removal case can be extended to use the bus ops
detect method when there is no card detect irq.  I will send a RFC patch.
quoted
2. Why block layer handling for inter-leave data? Can you give example
diver who is returning interleave data? And how to tell application that
buffer having interleave data?

I am not sure what you mean by interleave data, but file systems  for
example
are free to map any block to any file, directory or file system object,
so a consecutive series of sectors may contain unrelated data.  Up to a
maximum
size, the block layer merges I/O requests when the sectors are consecutive,
so an I/O request can also contain unrelated data.
[Ghorai] 
1. I don't think so, FS know where data exists and where is the free space. Except oth cluster.

2. Where its mentioned in block media that for segment-x[i],x[j] data read fail out of all all requested segments form [1..n].
And I never gone through any driver/protocol, that retry the next i+1th segment where ith-segment is failed. And for that my suggestion is preferred.
quoted
quoted
quoted
And the solution I was proposing to return the status of IO failure as
soon as possible to above layer; and handle the card removed interrupt
separately or any other issue in h/w or s/w or combination of both. Or
just think again when platform don't have the card remove interrupt.
quoted
So my patch addresses the 1st part
It is absolutely unacceptable to return I/O errors to the upper layers
for segments that do not have errors.
quoted
And for the 2nd part we can submit the patch anytime.
quoted
I can suggest a patch if you want but I am on vacation next week so
it will have to wait a couple of weeks.
quoted
And moreover we should not give the interleave data to apps, as we
don't
quoted
quoted
have option to tell application for the valid data.
quoted
[..snip..]
http://comments.gmane.org/gmane.linux.kernel.mmc/2714
quoted
quoted
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help