Re: Stopping raid6 (with journal) hangs w/ 100%CPU
From: Shaohua Li <shli@kernel.org>
Date: 2017-12-01 20:34:08
On Thu, Nov 30, 2017 at 11:22:23AM -0500, Larkin Lowrey wrote:
Seems locks are involved, and in a not so nice way. I had been chasing a problem that I thought was with the SAS controllers where the IO for the drives would stall for extended periods of time. The data from iostat would show 100% utilization but zero I/O and 1 or 2 avgqu-sz. Once the IO would resume the await value would be exactly the amount of time the IO had been stalled. I was sure it was the SAS controller or driver but it turns out it was the raid6 journal! During the periods of stalled IO, the mdX_raid6 process would go to 100% CPU and when the IO resumed that process would be back down at ~50%. When I switched back to resync (disabled the journal), those IO stalls went away. It appears to me that the bad locking behavior is the culprit. My priority now is to disable the journals on my two remaining arrays which boot up with mdX_raid6 at 100% CPU. Because of their state mdadm is unable to act on them at all. How can I disable the journal of these arrays which are 'inactive'?
I think we should fix the journal bug. Can you describe how you trigger this? anything abnormal happens? Thanks, Shaohua
--Larkin On 11/27/2017 9:42 PM, Larkin Lowrey wrote:quoted
The kernel thread is spending a lot of time in handle_active_stripes but not staying in there since I see it calling handle_stripe and then I also see calls to __release_stripe then back to handle_stripe again. So, it's clearly exiting handle_active_stripes but getting dumped right back in there. Presumably, handle_active_stripes is not completing the necessary work and the kernel thread keeps calling it expecting the pending work to be completed. When the system boots up an mdadm process runs at near 100% CPU with sporadic reads of the journal device. After ~30s or so, the IO stops, the mdadm process disappears, and the mdX_raid6 kernel thread starts taking 100% CPU.quoted
RIP: 0010:do_release_stripe+0x52/0x410 [raid456] RSP: 0018:ffffbb7d41dcbcc0 EFLAGS: 00000017 RAX: 0000000002000000 RBX: ffff8f10c819bf20 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffff8f10c819bf20 RDI: ffff8f114a2ad400 RBP: ffffbb7d41dcbce8 R08: ffff8f113e3dd888 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8f10c819bf30 R13: ffff8f113e3ddb58 R14: ffff8f113e3dd800 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8f115ed40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb7675e1000 CR3: 0000000412cf5000 CR4: 00000000000406e0 Call Trace: __release_stripe+0x16/0x20 [raid456] handle_active_stripes.isra.60+0x44e/0x5a0 [raid456] raid5d+0x42e/0x630 [raid456]--Larkin On 11/24/2017 2:20 PM, Larkin Lowrey wrote:quoted
After hard-rebooting, this instance (stripe_cache_active: 2) assembled just fine on boot. The next time I encountered this the array was 'inactive' on boot. There was a flurry of I/O initially (which seems to indicate journal re-play, then the array becoming 'active') but the I/O ceased without the array becoming active. This time... stripe_cache_active: 2376quoted
md125 : inactive md127p4[9](J) sdk1[2] sdl1[3] sdn1[5] sdo1[6] sdm1[4] sdj1[1] sdq1[8] sdp1[7] 31258219068 blocks super 1.2quoted
# mdadm -D /dev/md125 /dev/md125: Version : 1.2 Creation Time : Thu Oct 19 10:11:35 2017 Raid Level : raid6 Used Dev Size : 18446744073709551615 Raid Devices : 8 Total Devices : 9 Persistence : Superblock is persistent Update Time : Fri Nov 24 13:41:38 2017 State : active, FAILED, Not Started Active Devices : 8 Working Devices : 9 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Consistency Policy : journal Name : ########:3 UUID : de6a2ce0:1a4c510f:d7c89da4:1215a312 Events : 156844 Number Major Minor RaidDevice State - 0 0 0 removed - 0 0 1 removed - 0 0 2 removed - 0 0 3 removed - 0 0 4 removed - 0 0 5 removed - 0 0 6 removed - 0 0 7 removed - 259 3 - spare /dev/md127p4 - 8 225 5 sync /dev/sdo1 - 8 209 4 sync /dev/sdn1 - 8 193 3 sync /dev/sdm1 - 8 177 2 sync /dev/sdl1 - 8 161 1 sync /dev/sdk1 - 8 145 0 sync /dev/sdj1 - 65 1 7 sync /dev/sdq1 - 8 241 6 sync /dev/sdp1--Larkin On 11/23/2017 1:22 PM, Larkin Lowrey wrote:quoted
Sometimes, stopping a raid6 array (with journal) hangs, the mdX_raid6 process pegs at 100% CPU, and there is no I/O. Looks like it's stuck in an infinite loop. Kernel: 4.13.13-200.fc26.x86_64 The stack trace (echo l > /proc/sysrq-trigger) is always the same:quoted
handle_stripe+0x10c/0x2140 [raid456] ? pick_next_task_fair+0x491/0x550 handle_active_stripes.isra.60+0x3e5/0x5a0 [raid456] raid5d+0x42e/0x630 [raid456] ? prepare_to_wait_event+0x79/0x160 md_thread+0x125/0x170 ? md_thread+0x125/0x170 ? finish_wait+0x80/0x80 kthread+0x125/0x140 ? state_show+0x2f0/0x2f0 ? kthread_park+0x60/0x60 ? do_syscall_64+0x67/0x140 ret_from_fork+0x25/0x30The array is healthy, has a journal, and writes were idle for several minutes prior to running 'mdadm --stop'.quoted
md124 : active raid6 sdt1[6] sds1[5] sdw1[1] sdx1[2] sdy1[3] sdu1[7] sdv1[8] sdz1[4] md125p4[9](J) 23442092928 blocks super 1.2 level 6, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]stripe_cache_active: 2 stripe_cache_size: 32768 array_state: write-pending journal_mode: write-through [write-back] consistency_policy: journal --Larkin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html