Re: mdadm stuck at 0% reshape after grow
From: Edward Kuns <hidden>
Date: 2017-12-06 20:19:17
On Wed, Dec 6, 2017 at 10:21 AM, Phil Turmel [off-list ref] wrote:
The problem with the BBL right now is its existence.
I have a couple questions:
1) If I have bad blocks lists configured, how do I safely remove them?
I checked my three arrays and I have BBL configured on two of my eight
partitions making up my three arrays:
# mdadm --examine-badblocks /dev/sda5 /dev/sdb3
No bad-blocks list configured on /dev/sda5
No bad-blocks list configured on /dev/sdb3
# mdadm --examine-badblocks /dev/sda3 /dev/sdb2
No bad-blocks list configured on /dev/sda3
No bad-blocks list configured on /dev/sdb2
# mdadm --examine-badblocks /dev/sda2 /dev/sdb1 /dev/sdc1 /dev/sdd1
No bad-blocks list configured on /dev/sda2
No bad-blocks list configured on /dev/sdb1
Bad-blocks list is empty in /dev/sdc1
Bad-blocks list is empty in /dev/sdd1
I replaced sdc and sdd a couple years ago when one of the two failed.
(They were the same Seagate model that had a particularly high failure
rate not obvious when I bought them. So I replaced both.) Apparently
when I replaced them, I inadvertently enabled the BBL on them.
2) Wol, should there be a section on the Wiki about "Things you should
make sure you have configured" that includes disabling the BBL (unless
you know what you're doing), making sure you're scrubbing regularly,
making sure you have drives that support scterc (or if you don't,
configuring /sys/block/<device>/device/timeout), and so on? Perhaps a
list of information you should have handy before disaster strikes to
make life a lot easier if it does? E.g., running lsdrv or dumping
partition tables to text files or listing information about your RAID
configuration and LVM, etc.
I have an unrelated question due to poking around while gathering the
above information. I just realized that this code that I put in
/etc/rc.d/rc.local doesn't work for me because smartctl is not
returning an error:
# Force drives to play nice with MD
for i in /dev/sd? ; do
if smartctl -l scterc,70,70 $i > /dev/null ; then
echo -n $i " is good "
else
echo 180 > /sys/block/${i/\/dev\/}/device/timeout
echo -n $i " is bad "
fi;
smartctl -i $i | egrep "(Device Model|Product:)"
blockdev --setra 1024 $i
done
If I check this manually, I notice that smartctl returns 0 whether the
command succeeds or fails.
# smartctl -l scterc,70,70 /dev/sdb ; echo $?
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-100.fc23.x86_64]
(local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
SCT Commands not supported
0
This is an old Linux version and I need to upgrade, I know. Hopefully
over the holidays. But I got that scriptlet above from this mailing
list and I see it at
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch -- so did the
smartctl behavior change at some point?
# smartctl --version
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-100.fc23.x86_64]
(local build)
Thanks,
Eddie