Thread (6 messages) 6 messages, 3 authors, 2019-12-02

Re: general protection fault in __schedule (2)

From: Dmitry Vyukov <dvyukov@google.com>
Date: 2019-11-28 09:53:25
Also in: kvm, lkml

On Mon, Nov 25, 2019 at 6:54 PM Sean Christopherson
[off-list ref] wrote:
On Sat, Nov 23, 2019 at 06:15:15AM +0100, Dmitry Vyukov wrote:
quoted
On Fri, Nov 22, 2019 at 9:54 PM Sean Christopherson
[off-list ref] wrote:
quoted
On Thu, Nov 21, 2019 at 11:19:00PM -0800, syzbot wrote:
quoted
syzbot has bisected this bug to:

commit 8fcc4b5923af5de58b80b53a069453b135693304
Author: Jim Mattson [off-list ref]
Date:   Tue Jul 10 09:27:20 2018 +0000

    kvm: nVMX: Introduce KVM_CAP_NESTED_STATE

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000

Reported-by: syzbot+7e2ab84953e4084a638d@syzkaller.appspotmail.com
Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection
Is there a way to have syzbot stop processing/bisecting these things
after a reasonable amount of time?  The original crash is from August of
last year...

Note, the original crash is actually due to KVM's put_kvm() fd race, but
whatever we want to blame, it's a duplicate.

#syz dup: general protection fault in kvm_lapic_hv_timer_in_use
Hi Sean,

syzbot only sends bisection results to open bugs with no known fixes.
So what you did (marking the bug as invalid/dup, or attaching a fix)
would stop it from doing/sending bisection.

"Original crash happened a long time ago" is not necessary a good
signal. On the syzbot dashboard
(https://syzkaller.appspot.com/upstream), you can see bugs with the
original crash 2+ years ago, but they are still pretty much relevant.
The default kernel development process strategy for invalidating bug
reports by burying them in oblivion has advantages, but also
downsides. FWIW syzbot prefers explicit status tracking.
I have no objection to explicit status tracking or getting pinged on old
open bugs.  I suppose I don't even mind the belated bisection, I'd probably
whine if syzbot didn't do the bisection :-).

What's annoying is the report doesn't provide any information about when it
originally occured or on what kernel it originally failed.  It didn't occur
to me that the original bug might be a year old and I only realized it was
from an old kernel when I saw "4.19.0-rc4+" in the dashboard's sample crash
log.  Knowing that the original crash was a year old would have saved me
5-10 minutes of getting myself oriented.

Could syzbot provide the date and reported kernel version (assuming the
kernel version won't be misleading) of the original failure in its reports?
+syzkaller mailing list for syzbot discussion

We tried to provide some aggregate info in email reports long time ago
(like trees where it occurred, number of crashes). The problem was
that any such info captured in emails become stale very quickly. E.g.
later somebody looks at the report and thinking "oh, linux-next only"
or "it happened only once", but maybe it's not for a long time. E.g.
if we say "it last happened 3 months" ago, maybe it's just happened
again once we send it... While this "emails always provide latest
updates" works for kernel in other context b/c updates provided by
humans and there is no other source of truth; it does not play well
with automated systems, or syzbot will need to send several emails per
second, because it's really the rate at which things change.

If we add some info, which one should it be? The original crash, the
one used for bisection, or the latest one? All these are different...
syzbot does not know "4.19.0-rc4+" strings for commits, it generally
identifies commits by hashes. There are dates, but then again which
one? Author or commit? Author is what generally shown, but I remember
a number of patches where Author date is 1.5 years old for just merged
commits :)

There is another problem: if we stuff too many info into emails,
people still stop reading them. This is very serious and real concern.
If you have 1000-page manual, it's well documented, but it's
equivalent to no docs at all, nobody is reading 1000 pages to find 1
bit of info. Especially if you don't know that there is an important
bit that you need to find in the first place...

What would be undoubtedly positive is presenting information on the
dashboard better (If we find a way).
Currently the page says near the top:

First crash: 478d, last: 430d

The idea was that "last: 430d" is supposed to communicate the bit of
info that confused you. Is it what you were looking for? Is there a
better way to present it?

Unfortunately most of such problems are much harder if extended beyond
1 concrete case...
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help