Re: INFO: rcu detected stall in ndisc_alloc_skb
From: Dmitry Vyukov <dvyukov@google.com>
Date: 2019-01-07 11:13:09
Also in:
linux-mm, lkml
On Sun, Jan 6, 2019 at 2:47 PM Tetsuo Handa [off-list ref] wrote:
On 2019/01/06 22:24, Dmitry Vyukov wrote:quoted
quoted
A report at 2019/01/05 10:08 from "no output from test machine (2)" ( https://syzkaller.appspot.com/text?tag=CrashLog&x=1700726f400000 ) says that there are flood of memory allocation failure messages. Since continuous memory allocation failure messages itself is not recognized as a crash, we might be misunderstanding that this problem is not occurring recently. It will be nice if we can run testcases which are executed on bpf-next tree.What exactly do you mean by running test cases on bpf-next tree? syzbot tests bpf-next, so it executes lots of test cases on that tree. One can also ask for patch testing on bpf-next tree to test a specific test case.syzbot ran "some tests" before getting this report, but we can't find from this report what the "some tests" are. If we could record all tests executed in syzbot environments before getting this report, we could rerun the tests (with manually examining where the source of memory consumption is) in local environments.
Filed https://github.com/google/syzkaller/issues/917 for this.
Since syzbot is now using memcg, maybe we can test with sysctl_panic_on_oom == 1. Any memory consumption that triggers global OOM killer could be considered as a problem (e.g. memory leak or uncontrolled memory allocation).
Interesting idea. This will also alleviate the previous problem as I think only a stream of OOMs currently produces 1+MB of output. +Shakeel who was interested in catching more memcg-escaping allocations. To do this we need a buy-in from kernel community to consider this as a bug/something to fix in kernel. Systematic testing can't work gray checks requiring humans to look at each case and some cases left as being working-as-intended. There are also 2 interesting points: - testing of kernel without memcg-enabled (some kernel users obviously do this); it's doable, but currently syzkaller have no precedents/infrastructure to consider some output patterns as bugs or not depending on kernel features - false positives for minimized C reproducers that have memcg code stripped off (people complain that reproducers are too large/complex otherwise)