Thread (20 messages) 20 messages, 9 authors, 2024-05-31

Re: [TEST] Flake report

From: Simon Horman <horms@kernel.org>
Date: 2024-05-11 13:27:29

+ Aaron

On Thu, May 09, 2024 at 04:09:58PM -0700, Jakub Kicinski wrote:
Hi!

Feels like the efforts to get rid of flaky tests have slowed down a bit,
so I thought I'd poke people..

Here's the full list:
https://netdev.bots.linux.dev/flakes.html?min-flip=0&pw-y=0
click on test name to get the list of runs and links to outputs.

As a reminder please see these instructions for repro:
https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style

I'll try to tag folks who touched the tests most recently, but please
don't hesitate to chime in.


net
---

arp-ndisc-untracked-subnets-sh
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To: Jaehee Park <redacted>
Cc: Hangbin Liu <redacted>

Times out on debug kernels, passes on non-debug.
This is a real timeout, eats full 7200 seconds.

xfrm-policy-sh
~~~~~~~~~~~~~~
To: Hangbin Liu <redacted>

Times out on debug kernels, passed on non-debug,
This is a "inactivity" timeout, test doesn't print anything
for 900 seconds so the runner kills it. We can bump the timeout
but not printing for 15min is bad..

cmsg-time-sh
~~~~~~~~~~~~
To: Jakub Kicinski <kuba@kernel.org> (forgot I wrote this :D)

Fails randomly.

pmtu-sh
~~~~~~~
To: Simon Horman <horms@kernel.org>

Skipped because it wants full OVS tooling.
My understanding is that Aaron (CCed) is working on addressing
this problem by allowing the test to run without full OVS tooling.
forwarding
----------

sch-tbf-ets-sh, sch-tbf-prio-sh
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To: Petr Machata <petrm@nvidia.com>

These fail way too often on non-debug kernels :(
Perhaps we can extend the lower bound?

bridge-igmp-sh, bridge-mld-sh
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@nvidia.com>

On debug kernels it always fails with:

# TEST: IGMPv3 group 239.10.10.10 exclude timeout                     [FAIL]
# Entry 192.0.2.21 has blocked flag failed

For MLD:

# TEST: MLDv2 group ff02::cc exclude timeout                          [FAIL]
# Entry 2001:db8:1::21 has blocked flag failed

vxlan-bridge-1d-sh
~~~~~~~~~~~~~~~~~~
To: Ido Schimmel <idosch@nvidia.com>
Cc: Petr Machata <petrm@nvidia.com>

Flake fails almost always, with some form of "Expected to capture 0
packets, got $X"

mirror-gre-lag-lacp-sh
~~~~~~~~~~~~~~~~~~~~~~
To: Petr Machata <petrm@nvidia.com>

Often fails on debug with:

# TEST: mirror to gretap: LAG first slave (skip_hw)                   [FAIL]
# Expected to capture 10 packets, got 13.

mirror-gre-vlan-bridge-1q-sh, mirror-gre-bridge-1d-vlan-sh
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To: Petr Machata <petrm@nvidia.com>

Same kind of failure as above but less often and both on debug and non-debug.

tc-actions-sh
~~~~~~~~~~~~~
To: Davide Caratti <redacted>

It triggers a random unhandled interrupt, somehow (look at stderr).
It's the only test that does that.


mptcp
-----
To: Matthieu Baerts <matttbe@kernel.org>

simult-flows-sh is still quite flaky :(


nf
--
To: Florian Westphal <fw@strlen.de>

These are skipped because of some compatibility issues:

 nft-flowtable-sh, bridge-brouter-sh, nft-audit-sh

Please LMK if I need to update the CLI tooling. 
Or is this missing kernel config?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help