Thread (15 messages) 15 messages, 5 authors, 2011-07-11

Re: NAND BBT corruption on MPC83xx

From: Matthew L. Creech <hidden>
Date: 2011-07-05 19:58:50

On Fri, Jun 17, 2011 at 5:34 PM, Scott Wood [off-list ref] wrote=
:
It seems that the generic code always passes -1 with PAGEPROG, and only
provides the actual page address on SEQIN.

I don't think the ECC readback is needed, and the fact that it looks like
it has always been broken would seem to confirm that. =A0It's broken in
other ways, too -- it assumes a particular ECC layout. =A0Let's get rid o=
f it.
As for the corruption, could it be degradation from repeated reads of tha=
t
one page?
I modified nanddump to do repeated reads, and compare the data
obtained from the first iteration with that obtained later (to detect
bit-flips).  I tried 3 different variations:

- one which reads the first page (2k) of the last block
- one which reads the second page (2k) of the last block
- one which reads the entire last block (128k), just for comparison

As I understand it, read-disturb would primarily come into play when
the second page is read, since it's adjacent to the first page (please
correct me if I'm wrong there).  Anyway, all 3 of these tests were run
for at least 50 million read cycles, with no bit-flips detected.  So
I'm somewhat doubtful that this is the cause of the BBT corruption
I've been seeing.

=3D=3D=3D=3D

Separately, I set up 2 test devices to run while I was away last week.
 One of them contained 2 patches:

- Mike Hench's patch which eliminates this block of code in fsl_elbc_nand.c
- Adam Thomson's patch
(http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html)
which initializes oob_poi correctly

Upon my return, the device with these patches saw no problems at all,
and had no additional bad blocks.  The device without these patches
had some 200+ blocks which had been newly marked as bad in the BBT
over the course of 10 days.  After rebooting, this latter device then
failed to boot, as shown here:

http://mcreech.com/work/bbt-ecc-error4.txt

I'm currently running another test to verify which of the two patches
actually fixed this problem (which might take a few days), but it
seems like removing that block of code in fsl_elbc_nand.c is a good
idea.

--=20
Matthew L. Creech
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help