Re: Raid5 assemble after dual sata port failure
From: Chris Eddington <hidden>
Date: 2007-11-11 17:41:13
Yes, there is some kind of media error message in dmesg, below. It is not random, it happens at exactly the same moments in each xfs_repair -n run. Nov 11 09:48:25 altair kernel: [37043.300691] res 51/40:00:01:00:00/00:00:00:00:00/e1 Emask 0x9 (media error) Nov 11 09:48:25 altair kernel: [37043.304326] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:48:25 altair kernel: [37043.307672] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:48:25 altair kernel: [37043.307676] ata4.00: configured for UDMA/133 Nov 11 09:48:25 altair kernel: [37043.307684] ata4: EH complete Nov 11 09:48:27 altair kernel: [37043.747838] SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB) Nov 11 09:48:27 altair kernel: [37043.747861] sdd: Write Protect is off Nov 11 09:48:27 altair kernel: [37043.747878] SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Nov 11 09:49:19 altair kernel: [37065.709216] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:19 altair kernel: [37065.720197] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:19 altair kernel: [37065.732188] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:19 altair kernel: [37065.732192] ata4.00: configured for UDMA/133 Nov 11 09:49:19 altair kernel: [37065.732199] ata4: EH complete Nov 11 09:49:21 altair kernel: [37067.206243] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:21 altair kernel: [37067.210721] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:21 altair kernel: [37067.215727] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:21 altair kernel: [37067.215731] ata4.00: configured for UDMA/133 Nov 11 09:49:21 altair kernel: [37067.215738] ata4: EH complete Nov 11 09:49:24 altair kernel: [37068.107825] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:24 altair kernel: [37068.112730] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:24 altair kernel: [37068.117732] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:24 altair kernel: [37068.117736] ata4.00: configured for UDMA/133 Nov 11 09:49:24 altair kernel: [37068.117740] ata4: EH complete Nov 11 09:49:26 altair kernel: [37069.095665] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:26 altair kernel: [37069.100156] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:26 altair kernel: [37069.105148] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:26 altair kernel: [37069.105152] ata4.00: configured for UDMA/133 Nov 11 09:49:26 altair kernel: [37069.105159] ata4: EH complete Nov 11 09:49:28 altair kernel: [37069.996842] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:28 altair kernel: [37070.000912] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:28 altair kernel: [37070.005916] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:28 altair kernel: [37070.005919] ata4.00: configured for UDMA/133 Nov 11 09:49:28 altair kernel: [37070.005924] ata4: EH complete Nov 11 09:49:31 altair kernel: [37070.983850] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:31 altair kernel: [37070.987914] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:31 altair kernel: [37070.992917] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:31 altair kernel: [37070.992920] ata4.00: configured for UDMA/133 Nov 11 09:49:31 altair kernel: [37070.992935] ata4: EH complete Nov 11 09:49:31 altair kernel: [37071.000639] SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB) Nov 11 09:49:31 altair kernel: [37071.000719] sdd: Write Protect is off Nov 11 09:49:31 altair kernel: [37071.000745] SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Nov 11 09:49:31 altair kernel: [37071.000762] SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB) Nov 11 09:49:31 altair kernel: [37071.000770] sdd: Write Protect is off Nov 11 09:49:31 altair kernel: [37071.000788] SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Nov 11 09:49:33 altair kernel: [37072.213749] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:33 altair kernel: [37072.218227] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:33 altair kernel: [37072.223231] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:33 altair kernel: [37072.223235] ata4.00: configured for UDMA/133 Nov 11 09:49:33 altair kernel: [37072.223242] ata4: EH complete Nov 11 09:49:36 altair kernel: [37073.283239] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:36 altair kernel: [37073.286894] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:36 altair kernel: [37073.290220] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:36 altair kernel: [37073.290224] ata4.00: configured for UDMA/133 Nov 11 09:49:36 altair kernel: [37073.290231] ata4: EH complete Nov 11 09:49:38 altair kernel: [37074.094417] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:38 altair kernel: [37074.097652] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:38 altair kernel: [37074.100988] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:38 altair kernel: [37074.100992] ata4.00: configured for UDMA/133 Nov 11 09:49:38 altair kernel: [37074.100997] ata4: EH complete Nov 11 09:49:40 altair kernel: [37074.992267] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:40 altair kernel: [37074.996747] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:40 altair kernel: [37075.000074] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:40 altair kernel: [37075.000078] ata4.00: configured for UDMA/133 Nov 11 09:49:40 altair kernel: [37075.000083] ata4: EH complete Nov 11 09:49:42 altair kernel: [37075.803457] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:42 altair kernel: [37075.807516] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:42 altair kernel: [37075.810842] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:42 altair kernel: [37075.810846] ata4.00: configured for UDMA/133 Nov 11 09:49:42 altair kernel: [37075.810853] ata4: EH complete Nov 11 09:49:44 altair kernel: [37076.700452] res 51/40:00:0f:00:00/00:00:00:00:00/ef Emask 0x9 (media error) Nov 11 09:49:44 altair kernel: [37076.704947] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:44 altair kernel: [37076.708272] ata4.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168 Nov 11 09:49:44 altair kernel: [37076.708275] ata4.00: configured for UDMA/133 Nov 11 09:49:44 altair kernel: [37076.708290] ata4: EH complete Nov 11 09:49:44 altair kernel: [37076.709550] SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB) Nov 11 09:49:44 altair kernel: [37076.709572] sdd: Write Protect is off Nov 11 09:49:44 altair kernel: [37076.709594] SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Nov 11 09:49:44 altair kernel: [37076.709611] SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB) Nov 11 09:49:44 altair kernel: [37076.709623] sdd: Write Protect is off Nov 11 09:49:44 altair kernel: [37076.709705] SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA David Greaves wrote:
Chris Eddington wrote:quoted
Hi, Thanks for the pointer on xfs_repair -n , it actually tells me something (some listed below) but I'm not sure what it means but there seems to be a lot of data loss. One complication is I see an error message in ata6, so I moved the disks around thinking it was a flaky sata port, but I see the error again on ata4 so it seems to follow the disk. But it happens exactly at the same time during xfs_repair sequence, so I don't think it is a flaky disk.Does dmesg have any info/sata errors? xfs_repair will have problems if the disk is bad. You may want to image the disk (possibly onto the 'spare'?) if it is bad.quoted
I'll go to the xfs mailing list on this.Very good idea :)quoted
Is there a way to be sure the disk order is right?The order looks right to me. xfs_repair wouldn't recognise it as well as it does if the order was wrong.quoted
not way out of wack since I'm seeing so much from xfs_repair. Also since I've been moving the disks around, I want to be sure I have the right order.Bear in mind that -n stops the repair fixing a problem. Then as the 'repair' proceeds it becomes very confused by problems that should have been fixed. This is evident in the superblock issue (which also probably explains the failed mount).quoted
Is there a way to try restoring using the other disk?No the event count was very out of date.