Re: Software RAID6 broke after power outage
From: Wols Lists <hidden>
Date: 2020-07-22 09:14:48
On 22/07/20 08:41, Cory Derenburger wrote:
My server lost power this morning. The server is running Linux Mint (14?) on a battery backup and I believe it shutdown before losing power. Upon restarting the server the computer hung for a while, and after resetting and booting up in recovery mode my RAID is now nonfunctional. The server was set up years ago with a RAID 6 array built with mdadm. To be honest I don't really know what is wrong with the array, it seems to be an issue with disk sdc. I wanted to reach out for help to confirm the issue and get some guidance before proceeding (or making things worse). Any assistance that can help me determine what steps to take to get this server back up and running would be greatly appreciated. It's been 4+ since I have touched RAID, and only attempted a recovery once. If anyone can help I would be super appreciative.
https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn https://raid.wiki.kernel.org/index.php/Asking_for_help I see you've included some stuff which is helpful, but can you do everything that last page asks for. In particular, lsdrv.
Below I'm including outputs from various commands for the 3rd disk
which seems to be the culprit
dmesg - boot section section where first errors begin occurring
[ 2.637856] md: bind<sdd1>
[ 2.646987] random: nonblocking pool is initialized
[ 2.647432] md: bind<sde1>
[ 2.651429] md: bind<sdb1>
[ 2.863538] ata3.00: exception Emask 0x0 SAct 0x10 SErr 0x0 action 0x0
[ 2.863594] ata3.00: irq_stat 0x40000008
[ 2.863643] ata3.00: failed command: READ FPDMA QUEUED
[ 2.863695] ata3.00: cmd 60/08:20:08:08:00/00:00:00:00:00/40 tag 4
ncq 4096 in
[ 2.863695] res 41/40:00:09:08:00/00:00:00:00:00/40 Emask
0x409 (media error) <F>
[ 2.863775] ata3.00: status: { DRDY ERR }
[ 2.863822] ata3.00: error: { UNC }
[ 2.873407] ata3.00: configured for UDMA/133
[ 2.873476] sd 2:0:0:0: [sdc] Unhandled sense code
[ 2.873525] sd 2:0:0:0: [sdc]
[ 2.873571] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2.873619] sd 2:0:0:0: [sdc]
[ 2.873665] Sense Key : Medium Error [current] [descriptor]
[ 2.873819] Descriptor sense data with sense descriptors (in hex):
[ 2.873901] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 2.874544] 00 00 08 09
[ 2.874764] sd 2:0:0:0: [sdc]
[ 2.874811] Add. Sense: Unrecovered read error - auto reallocate failed
[ 2.874895] sd 2:0:0:0: [sdc] CDB:
[ 2.874941] Read(10): 28 00 00 00 08 08 00 00 08 00
[ 2.875428] end_request: I/O error, dev sdc, sector 2057
[ 2.875478] Buffer I/O error on device sdc1, logical block 1
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdb1[0](S) sde1[3](S) sdd1[2](S)
5860147464 blocks super 1.2
{not sure why these drives are now showing as spares}This is very common when an array fails to assemble properly. Unfortunately, when there's one error, it often triggers a cascade of fake errors, and this is probably the case here.
Below running mdstat for sdc. Checking sdb, sdd, sde appear fine. mdadm --examine /dev/sdc /dev/sdc: MBR Magic : aa55 Partition[0] : 3907027120 sectors at 2048 (type fd) mdadm --examine /dev/sdc1 mdadm: No md superblock detected on /dev/sdc1. fdisk -l Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes 81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x38389fdc Device Boot Start End Blocks Id System /dev/sdb1 2048 3907029167 1953513560 fd Linux raid autodetect Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes 81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xd108824d Device Boot Start End Blocks Id System /dev/sdc1 2048 3907029167 1953513560 fd Linux raid autodetect Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes 81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x6207659a Device Boot Start End Blocks Id System /dev/sdd1 2048 3907029167 1953513560 fd Linux raid autodetect Disk /dev/sde: 2000.4 GB, 2000398934016 bytes 81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xd9a4afcf Device Boot Start End Blocks Id System /dev/sde1 2048 3907029167 1953513560 fd Linux raid autodetect Is there other information needed to determine the issue? Where do I go from here?
How old is linux mint? Have you kept it up-to-date? Unfortunately, it seems a lot of older systems suffer issues when the kernel is heavily patched and mdadm is not updated, and this regularly surfaces on this list where Ubuntu is concerned ... mdadm --version uname -a Make sure you have a "latest and greatest" rescue disk to hand, and we'll see what the others say. Cheers, Wol