Thread (13 messages) 13 messages, 3 authors, 2021-05-08

Re: PROBLEM: double fault in md_end_io

From: Song Liu <song@kernel.org>
Date: 2021-04-15 06:35:56

On Wed, Apr 14, 2021 at 5:36 PM Song Liu [off-list ref] wrote:
On Tue, Apr 13, 2021 at 5:05 AM Paweł Wiejacha
[off-list ref] wrote:
quoted
Hello Song,

That code does not compile, but I guess that what you meant was
something like this:
Yeah.. I am really sorry for the noise.
quoted
diff --git drivers/md/md.c drivers/md/md.c
index 04384452a..cbc97a96b 100644
--- drivers/md/md.c
+++ drivers/md/md.c
@@ -78,6 +78,7 @@ static DEFINE_SPINLOCK(pers_lock);

 static struct kobj_type md_ktype;

+struct kmem_cache *md_io_cache;
 struct md_cluster_operations *md_cluster_ops;
 EXPORT_SYMBOL(md_cluster_ops);
 static struct module *md_cluster_mod;
@@ -5701,8 +5702,8 @@ static int md_alloc(dev_t dev, char *name)
         */
        mddev->hold_active = UNTIL_STOP;

-   error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
-                     sizeof(struct md_io));
+   error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
+                     md_io_cache);
    if (error)
        goto abort;
@@ -9542,6 +9543,10 @@ static int __init md_init(void)
 {
    int ret = -ENOMEM;

+   md_io_cache = KMEM_CACHE(md_io, 0);
+   if (!md_io_cache)
+       goto err_md_io_cache;
+
    md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
    if (!md_wq)
        goto err_wq;
@@ -9578,6 +9583,8 @@ static int __init md_init(void)
 err_misc_wq:
    destroy_workqueue(md_wq);
 err_wq:
+   kmem_cache_destroy(md_io_cache);
+err_md_io_cache:
    return ret;
 }
@@ -9863,6 +9870,7 @@ static __exit void md_exit(void)
    destroy_workqueue(md_rdev_misc_wq);
    destroy_workqueue(md_misc_wq);
    destroy_workqueue(md_wq);
+   kmem_cache_destroy(md_io_cache);
 }

 subsys_initcall(md_init);
[...]
quoted
$ watch -n0.2 'cat /proc/meminfo | paste - - | tee -a ~/meminfo'
MemTotal:       528235648 kB    MemFree:        20002732 kB
MemAvailable:   483890268 kB    Buffers:            7356 kB
Cached:         495416180 kB    SwapCached:            0 kB
Active:         96396800 kB     Inactive:       399891308 kB
Active(anon):      10976 kB     Inactive(anon):   890908 kB
Active(file):   96385824 kB     Inactive(file): 399000400 kB
Unevictable:       78768 kB     Mlocked:           78768 kB
SwapTotal:             0 kB     SwapFree:              0 kB
Dirty:          88422072 kB     Writeback:        948756 kB
AnonPages:        945772 kB     Mapped:            57300 kB
Shmem:             26300 kB     KReclaimable:    7248160 kB
Slab:            7962748 kB     SReclaimable:    7248160 kB
SUnreclaim:       714588 kB     KernelStack:       18288 kB
PageTables:        10796 kB     NFS_Unstable:          0 kB
Bounce:                0 kB     WritebackTmp:          0 kB
CommitLimit:    264117824 kB    Committed_AS:   21816824 kB
VmallocTotal:   34359738367 kB  VmallocUsed:      561588 kB
VmallocChunk:          0 kB     Percpu:            65792 kB
HardwareCorrupted:     0 kB     AnonHugePages:         0 kB
ShmemHugePages:        0 kB     ShmemPmdMapped:        0 kB
FileHugePages:         0 kB     FilePmdMapped:         0 kB
HugePages_Total:       0        HugePages_Free:        0
HugePages_Rsvd:        0        HugePages_Surp:        0
Hugepagesize:       2048 kB     Hugetlb:               0 kB
DirectMap4k:      541000 kB     DirectMap2M:    11907072 kB
DirectMap1G:    525336576 kB
And thanks for these information.

I have set up a system to run the test, the code I am using is the top of the
md-next branch. I will update later tonight on the status.
I am not able to reproduce the issue after 6 hours. Maybe it is because I run
tests on 3 partitions of the same nvme SSD. I will try on a different host with
multiple SSDs.

Pawel, have you tried to repro with md-next branch?

Thanks,
Song
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help