Re: [TEST] Flake report
From: Simon Horman <horms@kernel.org>
Date: 2024-05-11 13:27:29
+ Aaron On Thu, May 09, 2024 at 04:09:58PM -0700, Jakub Kicinski wrote:
Hi! Feels like the efforts to get rid of flaky tests have slowed down a bit, so I thought I'd poke people.. Here's the full list: https://netdev.bots.linux.dev/flakes.html?min-flip=0&pw-y=0 click on test name to get the list of runs and links to outputs. As a reminder please see these instructions for repro: https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style I'll try to tag folks who touched the tests most recently, but please don't hesitate to chime in. net --- arp-ndisc-untracked-subnets-sh ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To: Jaehee Park <redacted> Cc: Hangbin Liu <redacted> Times out on debug kernels, passes on non-debug. This is a real timeout, eats full 7200 seconds. xfrm-policy-sh ~~~~~~~~~~~~~~ To: Hangbin Liu <redacted> Times out on debug kernels, passed on non-debug, This is a "inactivity" timeout, test doesn't print anything for 900 seconds so the runner kills it. We can bump the timeout but not printing for 15min is bad.. cmsg-time-sh ~~~~~~~~~~~~ To: Jakub Kicinski <kuba@kernel.org> (forgot I wrote this :D) Fails randomly. pmtu-sh ~~~~~~~ To: Simon Horman <horms@kernel.org> Skipped because it wants full OVS tooling.
My understanding is that Aaron (CCed) is working on addressing this problem by allowing the test to run without full OVS tooling.
forwarding ---------- sch-tbf-ets-sh, sch-tbf-prio-sh ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To: Petr Machata <petrm@nvidia.com> These fail way too often on non-debug kernels :( Perhaps we can extend the lower bound? bridge-igmp-sh, bridge-mld-sh ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To: Nikolay Aleksandrov <razor@blackwall.org> Cc: Ido Schimmel <idosch@nvidia.com> On debug kernels it always fails with: # TEST: IGMPv3 group 239.10.10.10 exclude timeout [FAIL] # Entry 192.0.2.21 has blocked flag failed For MLD: # TEST: MLDv2 group ff02::cc exclude timeout [FAIL] # Entry 2001:db8:1::21 has blocked flag failed vxlan-bridge-1d-sh ~~~~~~~~~~~~~~~~~~ To: Ido Schimmel <idosch@nvidia.com> Cc: Petr Machata <petrm@nvidia.com> Flake fails almost always, with some form of "Expected to capture 0 packets, got $X" mirror-gre-lag-lacp-sh ~~~~~~~~~~~~~~~~~~~~~~ To: Petr Machata <petrm@nvidia.com> Often fails on debug with: # TEST: mirror to gretap: LAG first slave (skip_hw) [FAIL] # Expected to capture 10 packets, got 13. mirror-gre-vlan-bridge-1q-sh, mirror-gre-bridge-1d-vlan-sh ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To: Petr Machata <petrm@nvidia.com> Same kind of failure as above but less often and both on debug and non-debug. tc-actions-sh ~~~~~~~~~~~~~ To: Davide Caratti <redacted> It triggers a random unhandled interrupt, somehow (look at stderr). It's the only test that does that. mptcp ----- To: Matthieu Baerts <matttbe@kernel.org> simult-flows-sh is still quite flaky :( nf -- To: Florian Westphal <fw@strlen.de> These are skipped because of some compatibility issues: nft-flowtable-sh, bridge-brouter-sh, nft-audit-sh Please LMK if I need to update the CLI tooling. Or is this missing kernel config?