Re: Degraded array but drive healthy

From: Phill Watkins <hidden>
Date: 2013-12-06 11:07:50

Hi,

Thanks for your advice.

I ran a non-destructive badblocks test on the drive last night and the
Multi_Zone_Error_Rate jumped to 9898 and crashed the machine (I can
only assume the terminal was overloaded or something).

I also have an output file full of bad blocks but SMART still shows no errors.

    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED         RAW_VALUE
     1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail
Always       -       0
     2 Throughput_Performance  0x0026   056   056   000    Old_age
Always       -       11660
     3 Spin_Up_Time            0x0023   089   089   025    Pre-fail
Always       -       3460
     4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       24
     5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail
Always       -       0
     7 Seek_Error_Rate         0x002e   252   252   051    Old_age
Always       -       0
     8 Seek_Time_Performance   0x0024   252   252   015    Old_age
Offline      -       0
     9 Power_On_Hours          0x0032   100   100   000    Old_age
Always       -       4012
    10 Spin_Retry_Count        0x0032   252   252   051    Old_age
Always       -       0
    11 Calibration_Retry_Count 0x0032   100   100   000    Old_age
Always       -       37
    12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       24
    191 G-Sense_Error_Rate      0x0022   252   252   000    Old_age
Always       -       0
    192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age
Always       -       0
    194 Temperature_Celsius     0x0002   064   064   000    Old_age
Always       -       31 (Min/Max 21/36)
    195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age
Always       -       0
    196 Reallocated_Event_Count 0x0032   252   252   000    Old_age
Always       -       0
    197 Current_Pending_Sector  0x0032   252   252   000    Old_age
Always       -       0
    198 Offline_Uncorrectable   0x0030   252   252   000    Old_age
Offline      -       0
    199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age
Always       -       0
    200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age
Always       -       9898
    223 Load_Retry_Count        0x0032   100   100   000    Old_age
Always       -       37
    225 Load_Cycle_Count        0x0032   100   100   000    Old_age
Always       -       2055

    SMART Error Log Version: 1
    No Errors Logged

Can I assume this is a bad disk and go ahead with an RMA or can the
Multi_Zone_Error_Rate indicate some other issue?

Thanks

P.S. Yes, I used smartctl -t long when I tested the drive.

On 4 December 2013 22:59, Mathias Burén [off-list ref] wrote:

On 4 December 2013 22:23, Phill Watkins [off-list ref] wrote:

quoted

Hi,

I have an issue that I can't really pin down.

I have two RAID 1 arrays, one for /boot and another for an LVM.

Yesterday one of the arrays (the LVM) became degraded after a reboot
which included an automated fsck on all filesystems.

I've run full SMART tests on both drives and both completed without errors:
[SNIP]

I'd really appreciate some advice.

Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hi,

Full SMART Self-test does that mean smartctl -t long? You could try a
nondestructive badblocks session on both drives, but it takes a while.
http://www.pantz.org/software/badblocks/badblocksusage.html

Regards,
Mathias

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help