Re: INFO: rcu detected stall in ndisc_alloc_skb

From: Dmitry Vyukov <dvyukov@google.com>
Date: 2019-01-07 11:13:09
Also in: linux-mm, lkml

On Sun, Jan 6, 2019 at 2:47 PM Tetsuo Handa
[off-list ref] wrote:

On 2019/01/06 22:24, Dmitry Vyukov wrote:

quoted

A report at 2019/01/05 10:08 from "no output from test machine (2)"
( https://syzkaller.appspot.com/text?tag=CrashLog&x=1700726f400000 )
says that there are flood of memory allocation failure messages.
Since continuous memory allocation failure messages itself is not
recognized as a crash, we might be misunderstanding that this problem
is not occurring recently. It will be nice if we can run testcases
which are executed on bpf-next tree.

What exactly do you mean by running test cases on bpf-next tree?
syzbot tests bpf-next, so it executes lots of test cases on that tree.
One can also ask for patch testing on bpf-next tree to test a specific
test case.

syzbot ran "some tests" before getting this report, but we can't find from
this report what the "some tests" are. If we could record all tests executed
in syzbot environments before getting this report, we could rerun the tests
(with manually examining where the source of memory consumption is) in local
environments.

Filed https://github.com/google/syzkaller/issues/917 for this.

Since syzbot is now using memcg, maybe we can test with sysctl_panic_on_oom == 1.
Any memory consumption that triggers global OOM killer could be considered as
a problem (e.g. memory leak or uncontrolled memory allocation).

Interesting idea. This will also alleviate the previous problem as I
think only a stream of OOMs currently produces 1+MB of output.

+Shakeel who was interested in catching more memcg-escaping allocations.

To do this we need a buy-in from kernel community to consider this as
a bug/something to fix in kernel. Systematic testing can't work gray
checks requiring humans to look at each case and some cases left as
being working-as-intended.

There are also 2 interesting points:
 - testing of kernel without memcg-enabled (some kernel users
obviously do this); it's doable, but currently syzkaller have no
precedents/infrastructure to consider some output patterns as bugs or
not depending on kernel features
 - false positives for minimized C reproducers that have memcg code
stripped off (people complain that reproducers are too large/complex
otherwise)

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help