Thread (3 messages) 3 messages, 2 authors, 2013-05-16

RE: Kernel crash at md_seq_show of drivers/md/md.c

From: Mo, Moore <hidden>
Date: 2013-05-16 12:35:25
Subsystem: software raid (multiple disks) support, the rest · Maintainers: Song Liu, Yu Kuai, Linus Torvalds

On Wed, 15 May 2013 20:11:52 +0800 "Mo, Moore" [off-list ref] wrote:
quoted
Dear Neil,

    Thank you for your clue. I make deeper trace after that and found another suspicion point. Kernel maybe wakeup a mdk_thread which was kfree in failed case of mddev->pers->run(mddev).

    For make sure my guess, I add some printk in order to get more info. It seems that md_seq_show try to wake up a mdk_thread in mddev_unlock() before md_seq_show return, but the thread was kfree already by "out_free_conf" case of run() function in drivers/md/raid10.c.

    The Oops report as attached.
I don't really have time for bug reports against ancient kernels - sorry.
If you think you've found a bug, please at least look in the current mainline kernel to see if the code has changed, and preferably reproduce against the current mainline kernel.

In this case the bug was fixed by commit
      01f96c0a9922cd9919baf9d16febdf7016177a12
18 months ago (linux 3.1).
Thanks your information and sorry about that I haven't look through latest mainline kernel this time. I will pay attention to it in future.

I have merged the patch from linux 3.1 (with a little adaptation for 2.6.37), and verified on my issue theater. it works well. The Oops disappeared. Thank you.

Could you mind I ask an extended question: There is a NULL pointer protector in md_wakeup_thread. Why not set the thread pointer to NULL instant after md_unregister_thread in fail case of run? Just like as fail case of raid5 implement in 01f96c0a9922cd9919baf9d16febdf7016177a12~1. 

-------------------------------------------------
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 43709fa..ac5e8b5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4941,8 +4941,7 @@ static int run(mddev_t *mddev)

        return 0;
 abort:
-       md_unregister_thread(mddev->thread);
-       mddev->thread = NULL;
+       md_unregister_thread(&mddev->thread);
        if (conf) {
                print_raid5_conf(conf);
                free_conf(conf);
-------------------------------------------------
I surmise the reason is: the run() procedure is running in the context which was pointed by mddev->thead itself. Is it right?
But I have tried this way before I wrote this mail (actually before merge the commit of 3.1). It seems also work well. The "wakeup thread oops" disappeared too, and no another oops or KERN_ERR print output.

Question just for my tech interest. Excuse me if it disturb you.



Best Regards,
Moore(莫谋鑫)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help