Re: PROBLEM: double fault in md_end_io
From: Song Liu <song@kernel.org>
Date: 2021-04-15 06:35:56
On Wed, Apr 14, 2021 at 5:36 PM Song Liu [off-list ref] wrote:
On Tue, Apr 13, 2021 at 5:05 AM Paweł Wiejacha [off-list ref] wrote:quoted
Hello Song, That code does not compile, but I guess that what you meant was something like this:Yeah.. I am really sorry for the noise.quoted
diff --git drivers/md/md.c drivers/md/md.c index 04384452a..cbc97a96b 100644 --- drivers/md/md.c +++ drivers/md/md.c@@ -78,6 +78,7 @@ static DEFINE_SPINLOCK(pers_lock); static struct kobj_type md_ktype; +struct kmem_cache *md_io_cache; struct md_cluster_operations *md_cluster_ops; EXPORT_SYMBOL(md_cluster_ops); static struct module *md_cluster_mod;@@ -5701,8 +5702,8 @@ static int md_alloc(dev_t dev, char *name) */ mddev->hold_active = UNTIL_STOP; - error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE, - sizeof(struct md_io)); + error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE, + md_io_cache); if (error) goto abort;@@ -9542,6 +9543,10 @@ static int __init md_init(void) { int ret = -ENOMEM; + md_io_cache = KMEM_CACHE(md_io, 0); + if (!md_io_cache) + goto err_md_io_cache; + md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0); if (!md_wq) goto err_wq;@@ -9578,6 +9583,8 @@ static int __init md_init(void) err_misc_wq: destroy_workqueue(md_wq); err_wq: + kmem_cache_destroy(md_io_cache); +err_md_io_cache: return ret; }@@ -9863,6 +9870,7 @@ static __exit void md_exit(void) destroy_workqueue(md_rdev_misc_wq); destroy_workqueue(md_misc_wq); destroy_workqueue(md_wq); + kmem_cache_destroy(md_io_cache); } subsys_initcall(md_init);[...]quoted
$ watch -n0.2 'cat /proc/meminfo | paste - - | tee -a ~/meminfo' MemTotal: 528235648 kB MemFree: 20002732 kB MemAvailable: 483890268 kB Buffers: 7356 kB Cached: 495416180 kB SwapCached: 0 kB Active: 96396800 kB Inactive: 399891308 kB Active(anon): 10976 kB Inactive(anon): 890908 kB Active(file): 96385824 kB Inactive(file): 399000400 kB Unevictable: 78768 kB Mlocked: 78768 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 88422072 kB Writeback: 948756 kB AnonPages: 945772 kB Mapped: 57300 kB Shmem: 26300 kB KReclaimable: 7248160 kB Slab: 7962748 kB SReclaimable: 7248160 kB SUnreclaim: 714588 kB KernelStack: 18288 kB PageTables: 10796 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 264117824 kB Committed_AS: 21816824 kB VmallocTotal: 34359738367 kB VmallocUsed: 561588 kB VmallocChunk: 0 kB Percpu: 65792 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 541000 kB DirectMap2M: 11907072 kB DirectMap1G: 525336576 kBAnd thanks for these information. I have set up a system to run the test, the code I am using is the top of the md-next branch. I will update later tonight on the status.
I am not able to reproduce the issue after 6 hours. Maybe it is because I run tests on 3 partitions of the same nvme SSD. I will try on a different host with multiple SSDs. Pawel, have you tried to repro with md-next branch? Thanks, Song