[PATCH v9 2/5] mtd: nand: vf610_nfc: add hardware BCH-ECC support
From: stefan@agner.ch (Stefan Agner)
Date: 2015-07-31 23:39:11
Also in:
linux-devicetree, lkml
Hi Brian, On 2015-08-01 01:09, Brian Norris wrote: <snip>
quoted
+static inline int vf610_nfc_correct_data(struct mtd_info *mtd, uint8_t *dat) +{ + struct vf610_nfc *nfc = mtd_to_nfc(mtd); + u8 ecc_status; + u8 ecc_count; + int flip; + + ecc_status = __raw_readb(nfc->regs + ECC_SRAM_ADDR * 8 + ECC_OFFSET); + ecc_count = ecc_status & ECC_ERR_COUNT; + if (!(ecc_status & ECC_STATUS_MASK)) + return ecc_count; + + /* + * On an erased page, bit count should be zero or at least + * less then half of the ECC strength + */ + flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);Sorry I didn't notice this earlier, but it appears you are falling into the same trap that almost everyone else is -- it is not sufficient to check just the page area; you also need to check the OOB. Suppose that a MTD user wrote mostly-0xff data to the page, then the page accumulates bitflips in the spare area and a few in the page area, such that eventually HW ECC can't correct them. If there are few enough zero bits in the data area, you will mistakenly think that this is a blank page below, and memset() it to 0xff. That would be disastrous! Fortunately, your code is otherwise quite well structured and looks good. A tip below.quoted
+ + if (flip > ecc_count && flip > (nfc->chip.ecc.strength / 2)) + return -1; + + /* Erased page. */ + memset(dat, 0xff, nfc->chip.ecc.size); + return 0; +} + +static int vf610_nfc_read_page(struct mtd_info *mtd, struct nand_chip *chip, + uint8_t *buf, int oob_required, int page) +{ + int eccsize = chip->ecc.size; + int stat; + + vf610_nfc_read_buf(mtd, buf, eccsize); + + if (oob_required) + vf610_nfc_read_buf(mtd, chip->oob_poi, mtd->oobsize);To fix the bitflips issue above, you'll just want to unconditionally read the OOB (it's fine to ignore 'oob_required') and...quoted
+ + stat = vf610_nfc_correct_data(mtd, buf);...pass in chip->oob_poi as a third argument.
Hm, this probably will have an effect on performance, since we usually omit the OOB if not requested. I could fetch the OOB from the NAND controllers SRAM only if necessary (if HW ECC status is not ok...). Does this sound reasonable?
quoted
+ + if (stat < 0) + mtd->ecc_stats.failed++; + else + mtd->ecc_stats.corrected += stat;You've got another problem here: ecc.read_page() should be returning 'max_bitflips' here. So, since you have a single ECC region, this block should probably be: if (stat < 0) { mtd->ecc_stats.failed++; return 0; } else { mtd->ecc_stats.corrected += stat; return stat; }
Ok, will change that. -- Stefan