Thread (13 messages) 13 messages, 4 authors, 2015-05-22

Re: Recent drive errors

From: Thomas Fjellstrom <hidden>
Date: 2015-05-21 12:45:54

On Thu 21 May 2015 09:58:48 AM Mikael Abrahamsson wrote:
On Tue, 19 May 2015, Thomas Fjellstrom wrote:
quoted
How many UREs are considered "ok"? Tens, hundreds, thousands, tens of
thousands?
I will replace any drive that have developed UNC sectors a few times, so
I'd say "less than 10".
In this case, it looked like 5 UNC errors for a single sector, and some weird 
latency patterns, till I ran badblocks -w on it, then it gave me > 10k 
relocated sectors and many thousands more uncorrectable sectors. Before the 
badblocks test, it "looked" ok, now It's most definitely dead.
+1 on the "set kernel timeout to more than 120 seconds". I have this in
/etc/rc.local:

for x in /sys/block/sd[a-z] ; do
         echo 180  > $x/device/timeout
done

echo 4096 > /sys/block/md0/md/stripe_cache_size
I presume it's ok to do that even if the drives do ERC/TLER? Just woke up, but 
my brain seems to be telling me it shouldn't break anything since the ERC 
drives should always return after 7s no matter what...

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help