Re: mdadm grow raid 5 to 6 failure (crash)
From: David Gilmour <hidden>
Date: 2023-05-08 05:58:07
I'm not sure what I'm looking for here but here is the output of the
inflight file immediately after the mdadm assemble hangs. Does this
indicate something accessing the array?
#cat /sys/block/md127/inflight
1 0
Also attached is an strace of my mdadm command that hung in case that
reveals something relevant:
strace mdadm --assemble --verbose
--backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
/dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force 2>&1 |
tee mdadm_strace_output.txt
On Sun, May 7, 2023 at 7:23 PM Yu Kuai [off-list ref] wrote:Hi, 在 2023/05/06 21:19, David Gilmour 写道:quoted
quoted
From what I can tell it does look very similar. I stopped thesystemd-udevd service and renamed it to systemd-udevd.bak. My system still hung on the assemble command. I'm not savvy enough to decode the details here but does the "mddev_suspend.part.0+0xdf/0x150" line in the process stack output suggest the same i/o block the other post indicates? × systemd-udevd.service - Rule-based Manager for Device Events and Files Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static) Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11 MDT; 1min 27s ago Duration: 1d 20h 16min 29.633s TriggeredBy: × systemd-udevd-kernel.socket × systemd-udevd-control.socket Docs: man:systemd-udevd.service(8) man:udev(7) Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd (code=exited, status=203/EXEC) Main PID: 27440 (code=exited, status=203/EXEC) CPU: 5ms ---------------------- #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sdb /dev/sdf --force mdadm: looking for devices for /dev/md127 mdadm: /dev/sda is identified as a member of /dev/md127, slot 0. mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1. mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2. mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3. mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4. mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5. mdadm: /dev/md127 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on /root/mdadm5-6_backup_md127 mdadm: Failed to find backup of critical section mdadm: continuing without restoring backup mdadm: added /dev/sdh to /dev/md127 as 1 mdadm: added /dev/sdg to /dev/md127 as 2 mdadm: added /dev/sdc to /dev/md127 as 3 mdadm: added /dev/sdb to /dev/md127 as 4 mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date) mdadm: added /dev/sda to /dev/md127 as 0 #hangs indefinitely at this point in the output ------------------------------------------ root 27454 0.0 0.0 3812 2656 pts/1 D+ 07:00 0:00 mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sdb /dev/sdf --force root 27457 0.0 0.0 0 0 ? S 07:00 0:00 [md127_raid6] #cat /proc/27454/stack [<0>] mddev_suspend.part.0+0xdf/0x150 [<0>] suspend_lo_store+0xc5/0xf0 [<0>] md_attr_store+0x83/0xf0 [<0>] kernfs_fop_write_iter+0x124/0x1b0 [<0>] new_sync_write+0xff/0x190 [<0>] vfs_write+0x1ef/0x280 [<0>] ksys_write+0x5f/0xe0 [<0>] do_syscall_64+0x5c/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd #cat /proc/27457/stack [<0>] md_thread+0x122/0x160 [<0>] kthread+0xe0/0x100 [<0>] ret_from_fork+0x22/0x30Is there any thread stuck at raid5_make_request? something like below: Apr 23 19:17:22 atom kernel: task:systemd-udevd state:D stack: 0 pid: 8121 ppid: 706 flags:0x00000006 Apr 23 19:17:22 atom kernel: Call Trace: Apr 23 19:17:22 atom kernel: <TASK> Apr 23 19:17:22 atom kernel: __schedule+0x20a/0x550 Apr 23 19:17:22 atom kernel: schedule+0x5a/0xc0 Apr 23 19:17:22 atom kernel: schedule_timeout+0x11f/0x160 Apr 23 19:17:22 atom kernel: ? make_stripe_request+0x284/0x490 [raid456] Apr 23 19:17:22 atom kernel: wait_woken+0x50/0x70 Apr 23 19:17:22 atom kernel: raid5_make_request+0x2cb/0x3e0 [raid456] Apr 23 19:17:22 atom kernel: ? sched_show_numa+0xf0/0xf0 Apr 23 19:17:22 atom kernel: md_handle_request+0x132/0x1e0 Apr 23 19:17:22 atom kernel: ? do_mpage_readpage+0x282/0x6b0 Apr 23 19:17:22 atom kernel: __submit_bio+0x86/0x130 Apr 23 19:17:22 atom kernel: __submit_bio_noacct+0x81/0x1f0 Apr 23 19:17:22 atom kernel: mpage_readahead+0x15c/0x1d0 Apr 23 19:17:22 atom kernel: ? blkdev_write_begin+0x20/0x20 Apr 23 19:17:22 atom kernel: read_pages+0x58/0x2f0 Apr 23 19:17:22 atom kernel: page_cache_ra_unbounded+0x137/0x180 Apr 23 19:17:22 atom kernel: force_page_cache_ra+0xc5/0xf0 Apr 23 19:17:22 atom kernel: filemap_get_pages+0xe4/0x350 Apr 23 19:17:22 atom kernel: filemap_read+0xbe/0x3c0 Apr 23 19:17:22 atom kernel: ? make_kgid+0x13/0x20 Apr 23 19:17:22 atom kernel: ? deactivate_locked_super+0x90/0xa0 Apr 23 19:17:22 atom kernel: blkdev_read_iter+0xaf/0x170 Apr 23 19:17:22 atom kernel: new_sync_read+0xf9/0x180 Apr 23 19:17:22 atom kernel: vfs_read+0x13c/0x190 Apr 23 19:17:22 atom kernel: ksys_read+0x5f/0xe0 Apr 23 19:17:22 atom kernel: do_syscall_64+0x59/0x90 By the way, cat /sys/block/mdxx/inflight can prove this as well. If this is the case, can you find out who is accessing the array? Thanks, Kuai
Attachments
- mdadm_strace_output.zip [application/x-zip-compressed] 24214 bytes