Re: [syzbot] unexpected kernel reboot (4)
From: Dmitry Vyukov <dvyukov@google.com>
Date: 2021-04-22 17:00:40
Also in:
lkml
On Thu, Apr 22, 2021 at 6:13 PM Tetsuo Handa [off-list ref] wrote:
On 2021/04/22 23:20, Dmitry Vyukov wrote:quoted
I've prepared this syzkaller change: https://github.com/google/syzkaller/pull/2550/filesOK. Please merge and let's see whether syzkaller can find different ways.
Merge. Thanks for digging into this.
In my environment, this problem behaves very puzzling. While the reproducer I use is single threaded, changing timing via CONFIG_DEBUG_KOBJECT=y or even https://syzkaller.appspot.com/x/patch.diff?x=13d69ffed00000 avoids this problem. I can't narrow down what is happening.
This: - kill_cad_pid(SIGINT, 1); suggests the change can help... I think... this is good.
quoted
Re hibernation/suspend configs, you said disabling them is not helping, right? Does it still make sense to disable them? If these configs are enabled, we can at least find some bugs in the preparation for suspend code. However, as you noted, it will immediately lead to "lost connection". Ideally we somehow tweak hibernation/suspend to get to the hibernation/suspend point and then immediately and automatically resume.That will be one of disable-specific-functionality changes.quoted
This way we could test both suspend and unsuspend code, which I assume can lead to bugs, and don't cause "lost connection" at the same time. I guess such a mode does not exist today... and I am not sure what happens with TCP connections after this.I don't know whether ssh sessions can survive 10 seconds of hibernation/suspend. But maybe disabling hibernation/suspend configs until disable-specific-functionality changes are accepted makes sense.
We would need to disable CONFIG_SUSPEND and CONFIG_HIBERNATION. I am thinking if we will gain more than we lose... We will lose coverage of these subsystems, but this will eliminate some of "lost connection" crashes. Do you have any understanding as to how many "lost connection"s this can prevent?