Thread (4 messages) 4 messages, 2 authors, 2016-05-31

Re: codel/fq_codel triggers heaps of WARNs in net/sched/sch_hfsc.c:1426

From: Florian Westphal <fw@strlen.de>
Date: 2016-05-31 10:00:56
Subsystem: networking [general], tc subsystem, the rest · Maintainers: "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim, Jiri Pirko, Linus Torvalds

Miroslav Kratochvil [off-list ref] wrote:
Hello everyone,

I've been trying to debug an issue that arises when I'm using codel
(of fq_codel) qdiscs attached to a HFSC leaf class. Basic problem is
that on random points in time, kernel log gets overfilled (tens of
MB's of the messages) with many WARNINGs at net/sched/sch_hfsc.c:1426;
full text of several is attached below. The warnings appear randomly
in time, but always in (large) groups.

I was thinking that it is an issue relevant to a similar thing with
SFQ, where it's been fixed by some trimming of stats produced by SFQ.
Documented here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945

Similar patch for codel and fq_codel was recommended me for trying out, here:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/net/sched/sch_fq_codel.c?h=linux-4.5.y&id=01465faa0e2d311512690724196042f9bb466034
but the issue didn't get solved by it.

Also also, there's my original debian bugreport:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=824790

Is there any good approach I can debug this? I currently have a test
system where I can trigger the message easily with any custom kernel;
I'd appreciate any advice on what to try out next.

The messages from test kernel are from 4.5.5 on debian with ~20k hfsc
classes; I'll try to test out 4.6 ASAP but there seems to be no
relevant change in this direction. tg3 driver is not to blame (same
happens with e1000, e1000e, igb and ixgbe). I'm not sure whether u32
filter hashbuckets could trigger this behavior, but hope not
(currently I have no method to try this without u32).

Thanks in advance for any thoughts on this.
Both HFSC and fq_codel have problems, but I'm not sure if these are
relevant for your 4.5.5 kernel.
I'll submit a hfsc patch soon (it does fix a real problem).

If you have any config knobs enabled on the fq_codel leaf qdiscs it
would be good to know what parameters are used.

Can you try this patch (it doesn't fix anything but might provide more info):
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index d783d7c..045169e 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -49,6 +49,8 @@
  * a class whose fit-time exceeds the current time.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/types.h>
@@ -1423,7 +1425,11 @@ hfsc_schedule_watchdog(struct Qdisc *sch)
 		if (next_time == 0 || next_time > q->root.cl_cfmin)
 			next_time = q->root.cl_cfmin;
 	}
-	WARN_ON(next_time == 0);
+	if (WARN_ON_ONCE(next_time == 0)) {
+		pr_warn_ratelimited("qlen %u droplist_empty: %d, cfmin %llu, minel %d, root_empty %d\n",
+				    sch->q.qlen, list_empty(&q->droplist),
+				    (unsigned long long)q->root.cl_cfmin, !!cl, RB_EMPTY_ROOT(&q->eligible));
+	}
 	qdisc_watchdog_schedule(&q->watchdog, next_time);
 }
 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help