Re: PROBLEM: double fault in md_end_io
From: Paweł Wiejacha <hidden>
Date: 2021-04-15 15:36:19
Pawel, have you tried to repro with md-next branch?
Yes, and it also crashes: $ git log --decorate --format=oneline ca882ba4c0478c68f927fa7b622ec17bde9361ce (HEAD -> md-next-pw2) using slab_pool for md_io_pool 2715e61834586cef8292fcaa457cbf2da955a3b8 (song-md/md-next) md/bitmap: wait for external bitmap writes to complete during tear down c84917aa5dfc4c809634120fb429f0ff590a1f75 md: do not return existing mddevs from mddev_find_or_alloc <0>[ 2086.596361] traps: PANIC: double fault, error_code: 0x0 <4>[ 2086.596364] double fault: 0000 [#1] SMP NOPTI <4>[ 2086.596365] CPU: 40 PID: 0 Comm: swapper/40 Tainted: G OE 5.12.0-rc3-md-next-pw-fix2 #1 <4>[ 2086.596365] Hardware name: empty S8030GM2NE/S8030GM2NE, BIOS V4.00 03/11/2021 <4>[ 2086.596366] RIP: 0010:__slab_free+0x26/0x380 <4>[ 2086.596367] Code: 1f 44 00 00 0f 1f 44 00 00 55 49 89 ca 45 89 c3 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83 e4 f0 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c 24 20 49 8b 5c <4>[ 2086.596368] RSP: 0018:ffff96fa4d1c8f90 EFLAGS: 00010086 <4>[ 2086.596369] RAX: ffff892f4fc35300 RBX: ffff890fdca3ff78 RCX: ffff890fdca3ff78 <4>[ 2086.596370] RDX: ffff890fdca3ff78 RSI: fffff393c1728fc0 RDI: ffff892fd3a9b300 <4>[ 2086.596370] RBP: ffff96fa4d1c9030 R08: 0000000000000001 R09: ffffffffb66500a7 <4>[ 2086.596371] R10: ffff890fdca3ff78 R11: 0000000000000001 R12: fffff393c1728fc0 <4>[ 2086.596371] R13: fffff393c1728fc0 R14: ffff892fd3a9b300 R15: 0000000000000000 <4>[ 2086.596372] FS: 0000000000000000(0000) GS:ffff892f4fc00000(0000) knlGS:0000000000000000 <4>[ 2086.596373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 2086.596373] CR2: ffff96fa4d1c8f88 CR3: 00000010c8210000 CR4: 0000000000350ee0 <4>[ 2086.596373] Call Trace: <4>[ 2086.596374] <IRQ> <4>[ 2086.596374] ? kmem_cache_free+0x3d2/0x420 <4>[ 2086.596374] ? mempool_free_slab+0x17/0x20 <4>[ 2086.596375] ? mempool_free_slab+0x17/0x20 <4>[ 2086.596375] ? mempool_free+0x2f/0x80 <4>[ 2086.596376] ? md_end_io+0x47/0x60 <4>[ 2086.596376] ? bio_endio+0xee/0x140 <4>[ 2086.596376] ? bio_chain_endio+0x2d/0x40 <4>[ 2086.596377] ? md_end_io+0x59/0x60 <4>[ 2086.596377] ? bio_endio+0xee/0x140 <4>[ 2086.596378] ? bio_chain_endio+0x2d/0x40 ... <4>[ 2086.596485] Lost 340 message(s)! Dumping last: `watch -n0.1 ...`: stat 451 0 13738 76 67823521 0 17162428974 801228700 351 1886784 801228776 0 0 0 0 0 0 inflight 0 354 md_io/aliases 4 md_io/align 8 md_io/cache_dma 0 md_io/cpu_partial 30 md_io/cpu_slabs 1235 N0=288 N1=239 N2=463 N3=245 md_io/destroy_by_rcu 0 md_io/hwcache_align 0 md_io/min_partial 5 md_io/objects 126582 N0=26928 N1=28254 N2=44064 N3=27336 md_io/object_size 40 md_io/objects_partial 0 md_io/objs_per_slab 102 md_io/order 0 md_io/partial 8 N2=4 N3=4 md_io/poison 0 md_io/reclaim_account 0 md_io/red_zone 0 md_io/remote_node_defrag_ratio 100 md_io/sanity_checks 0 md_io/slabs 1249 N0=264 N1=277 N2=436 N3=272 md_io/slabs_cpu_partial 1172(1172) C0=14(14) C1=17(17) C2=17(17) C3=17(17) C4=18(18) C5=17(17) C6=18(18) C7=17(17) C8=21(21) C9=4(4) C10=21(21) C11=22(22) C12=21(21) C13=25(25) C14=15(15) C15 =19(19) C16=24(24) C17=19(19) C18=22(22) C19=15(15) C20=28(28) C21=18(18) C22=18(18) C23=24(24) C24=17(17) C25=27(27) C26=8(8) C27=18(18) C28=17(17) C29=18(18) C30=21(21) C31=21(21) C32=9(9) C33=9(9) C34=24(24) C35=11(11) C36=19(19) C37=14(14) C38=18(18) C39=4(4) C40=23(23) C41=20(20) C42=18(18) C43=22(22) C44=21(21) C45=24(24) C46=20(20) C47=15(15) C48=17(17) C49=15(15) C50=16(1 6) C51=26(26) C52=21(21) C53=19(19) C54=16(16) C55=18(18) C56=26(26) C57=14(14) C58=18(18) C59=23(23) C60=18(18) C61=20(20) C62=19(19) C63=17(17) md_io/slab_size 40 md_io/store_user 0 md_io/total_objects 127398 N0=26928 N1=28254 N2=44472 N3=27744 md_io/trace 0 md_io/usersize 0
I am not able to reproduce the issue [...]
It crashes on two different machines. Next week I'm going to upgrade a distro on an older machine (with Intel NVMe disks, different motherboard and Xeon instead of EPYC 2 CPU) running currently linux-5.4 without this issue. So I will let you know if switching to a newer kernel with "improved io stats accounting" makes it unstable or not. Best regards, Pawel Wiejacha. On Thu, 15 Apr 2021 at 08:35, Song Liu [off-list ref] wrote:
On Wed, Apr 14, 2021 at 5:36 PM Song Liu [off-list ref] wrote:quoted
On Tue, Apr 13, 2021 at 5:05 AM Paweł Wiejacha [off-list ref] wrote:quoted
Hello Song, That code does not compile, but I guess that what you meant was something like this:Yeah.. I am really sorry for the noise.quoted
diff --git drivers/md/md.c drivers/md/md.c index 04384452a..cbc97a96b 100644 --- drivers/md/md.c +++ drivers/md/md.c@@ -78,6 +78,7 @@ static DEFINE_SPINLOCK(pers_lock); static struct kobj_type md_ktype; +struct kmem_cache *md_io_cache; struct md_cluster_operations *md_cluster_ops; EXPORT_SYMBOL(md_cluster_ops); static struct module *md_cluster_mod;@@ -5701,8 +5702,8 @@ static int md_alloc(dev_t dev, char *name) */ mddev->hold_active = UNTIL_STOP; - error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE, - sizeof(struct md_io)); + error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE, + md_io_cache); if (error) goto abort;@@ -9542,6 +9543,10 @@ static int __init md_init(void) { int ret = -ENOMEM; + md_io_cache = KMEM_CACHE(md_io, 0); + if (!md_io_cache) + goto err_md_io_cache; + md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0); if (!md_wq) goto err_wq;@@ -9578,6 +9583,8 @@ static int __init md_init(void) err_misc_wq: destroy_workqueue(md_wq); err_wq: + kmem_cache_destroy(md_io_cache); +err_md_io_cache: return ret; }@@ -9863,6 +9870,7 @@ static __exit void md_exit(void) destroy_workqueue(md_rdev_misc_wq); destroy_workqueue(md_misc_wq); destroy_workqueue(md_wq); + kmem_cache_destroy(md_io_cache); } subsys_initcall(md_init);[...]quoted
$ watch -n0.2 'cat /proc/meminfo | paste - - | tee -a ~/meminfo' MemTotal: 528235648 kB MemFree: 20002732 kB MemAvailable: 483890268 kB Buffers: 7356 kB Cached: 495416180 kB SwapCached: 0 kB Active: 96396800 kB Inactive: 399891308 kB Active(anon): 10976 kB Inactive(anon): 890908 kB Active(file): 96385824 kB Inactive(file): 399000400 kB Unevictable: 78768 kB Mlocked: 78768 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 88422072 kB Writeback: 948756 kB AnonPages: 945772 kB Mapped: 57300 kB Shmem: 26300 kB KReclaimable: 7248160 kB Slab: 7962748 kB SReclaimable: 7248160 kB SUnreclaim: 714588 kB KernelStack: 18288 kB PageTables: 10796 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 264117824 kB Committed_AS: 21816824 kB VmallocTotal: 34359738367 kB VmallocUsed: 561588 kB VmallocChunk: 0 kB Percpu: 65792 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 541000 kB DirectMap2M: 11907072 kB DirectMap1G: 525336576 kBAnd thanks for these information. I have set up a system to run the test, the code I am using is the top of the md-next branch. I will update later tonight on the status.I am not able to reproduce the issue after 6 hours. Maybe it is because I run tests on 3 partitions of the same nvme SSD. I will try on a different host with multiple SSDs. Pawel, have you tried to repro with md-next branch? Thanks, Song