Re: strange problem with raid6 read errors on active non-degraded array
From: NeilBrown <hidden>
Date: 2014-07-02 10:45:02
On Wed, 02 Jul 2014 10:32:41 +0100 Pedro Teixeira [off-list ref] wrote:
- I'm having the following problem on a raid6 md volume consisting og
16 1TB Seagtes SSHD's. ( using kernel 3.15.3 or 3.14.0 ) mdadm is 3.3.
- every time I run a fsck.ext4 I will get the exact same errors (
...short read ). Forcing a repair on the md0 volume shows no errors
and completes without problems. All disks are active and the volume is
not degraded, still I can't get rid of the short errors on those 16
blocks and when the filesystem is mounted the read errors will come up
from time to time as they are probably in use.
- If I try to read those blocks with DD ( dd if=/dev/md0 of=test.txt
seek=458227712 count=6 bs=4096 ) it will instantly create a 1.8T file
but the file doesn't appear to have nothing on it ( and the file
doesn't take the 1.8T on disk as the disk is much smaller )
- this started happening after having a three disk failure. I
recovered from that failure by recreating the array with the
non-failed 13 disks plus the last failed one ( events didn't differ
much ). I then readed the other disks. The failed disks are all
physically good, tested them with hdat2 and they don't have read/write
errors so I reused them. I don't know why they failed, maybe some
incompatibility with SSHD's and the LSI HBA controller..
root@nas3:/# dd if=/dev/md0 of=teste.txt seek=458227712 count=6 bs=4096
6+0 records in
6+0 records out
24576 bytes (25 kB) copied, 0.0019239 s, 12.8 MB/s
root@nas3:/# ls -lah teste.txt
-rw-r--r-- 1 root root 1.8T Jul 2 10:22 teste.txt
root@nas3:/#
root@nas3:/# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sde[0] sdq[15] sdp[14] sdo[17] sdn[19] sdm[16]
sdl[18] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sdb[3] sdd[2] sdc[1]
13672838144 blocks super 1.2 level 6, 512k chunk, algorithm 2
[16/16] [UUUUUUUUUUUUUUUU]
- When doing a fsck.ext4 of /dev/md0 it returns the following ( and I
can do it over and over again with the exact same errors) :
root@nas3:/# fsck.ext4 -f /dev/md0
e2fsck 1.42.10 (18-May-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error reading block 458227712 (Attempt to read block from filesystem
resulted in short read) while reading inode and block bitmaps. Ignore
error<y>? yesCan't possible happen! (Do worry, I say that a lot - I'm usually wrong). What sort of computer? Particularly is it 32bit or 64bit? Try using 'dd' to read a few meg at various offsets (1G, 2G, 4G, 6G, 8G, ....) and find out if there is a pattern, where it can read and where it cannot. NeilBrown
Attachments
- signature.asc [application/pgp-signature] 828 bytes