Re: Re-map disk sectors in userspace when rewriting after read errors
From: Majed B. <hidden>
Date: 2009-09-18 10:52:14
Well, I think my case is different Matthias's and I can't reconstruct the data anymore, as you said, Robin. So this leaves me with a degraded array with bad sectors and a dodgy filesystem. You see, I can mount the LVM Logical Volume (formatted with XFS), but as soon as I hit some bad sectors, XFS complains and then one of the array disks jump out. Just now, one disk exited the array and renamed itself from sdg to sdj .... (this is the first time this happens). According to smartctl -a /dev/sdj, there are no bad sectors, but I still get this in /var/log/messages Sep 18 07:01:38 Adam kernel: [316599.950147] sd 6:0:0:0: [sdg] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Sep 18 07:01:38 Adam kernel: [316599.950175] raid5:md0: read error not correctable (sector 1240859816 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950223] raid5:md0: read error not correctable (sector 1240859824 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950225] raid5:md0: read error not correctable (sector 1240859832 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950227] raid5:md0: read error not correctable (sector 1240859840 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950230] raid5:md0: read error not correctable (sector 1240859848 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950232] raid5:md0: read error not correctable (sector 1240859856 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950234] raid5:md0: read error not correctable (sector 1240859864 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950236] raid5:md0: read error not correctable (sector 1240859872 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950238] raid5:md0: read error not correctable (sector 1240859880 on sdg1). Sep 18 07:01:38 Adam kernel: [316599.950240] raid5:md0: read error not correctable (sector 1240859888 on sdg1). When the disk exits the array, it becomes useless (6 out of 8 disks) and XFS complains: Sep 18 07:01:46 Adam kernel: [316607.896293] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0. Returning error. Sep 18 07:01:46 Adam kernel: [316607.896374] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0. Returning error. Sep 18 07:01:46 Adam kernel: [316607.896453] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0. Returning error. Here's some info on smartctl -a /dev/sdg 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 I can't find an explanation to why disks are behaving this way... ==================================================== Plan B: Since I cloned the disk with bad sectors to another, what would happen if I zeroed the damaged one then cloned the clone to it?! I do realize that there will be zeros in the areas of bad sectors, but how will mdadm/md behave? Would a resync fail? I can run fsck at that point and files residing on bad sectors will be the only affected ones, correct? On Fri, Sep 18, 2009 at 1:22 PM, Robin Hill [off-list ref] wrote:
On Fri Sep 18, 2009 at 12:57:23PM +0300, Majed B. wrote:quoted
Thank you for the insight, Robin. I already have used dd_rescue to find which sectors are bad, so I guess I could either wait for Matthias to finish his modifications to mdadm, or I can reconstruct the bad sectors manually (read same sector from other disks, xor all, write to damaged disk's clone).This won't work if your array is degraded though - you don't have enough data to do the reconstruction (unless you have two failed drives you can partially read?).quoted
Weird thing though, is that when I re-read some of the bad sectors, I didn't get I/O errors ... it's confusing!Odd. I'd recommend using ddrescue rather than dd_rescue - it's faster and handles retries of bad sectors better.quoted
Also, I'd rather avoid a fsck when I have bad sectors to not lose files. I'll run fsck once I've fixed the bad sectors and resynced the array.True - a fsck should only be done once the data's in the best possible state, Cheers, Robin -- ___ ( ' } | Robin Hill [off-list ref] | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" |
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html