Re: [PATCH net-next V2] net: sched: fallback to qdisc noqueue if default qdisc setup fail
From: Jakub Kicinski <kuba@kernel.org>
Date: 2020-05-01 19:01:38
On Fri, 1 May 2020 13:56:02 +0200 Jesper Dangaard Brouer wrote:
On Thu, 30 Apr 2020 12:45:49 -0700 Jakub Kicinski [off-list ref] wrote:quoted
On Thu, 30 Apr 2020 13:42:22 +0200 Jesper Dangaard Brouer wrote:quoted
Currently if the default qdisc setup/init fails, the device ends up with qdisc "noop", which causes all TX packets to get dropped. With the introduction of sysctl net/core/default_qdisc it is possible to change the default qdisc to be more advanced, which opens for the possibility that Qdisc_ops->init() can fail. This patch detect these kind of failures, and choose to fallback to qdisc "noqueue", which is so simple that its init call will not fail. This allows the interface to continue functioning. V2: As this also captures memory failures, which are transient, the device is not kept in IFF_NO_QUEUE state. This allows the net_device to retry to default qdisc assignment. Signed-off-by: Jesper Dangaard Brouer <redacted>I have mixed feelings about this one, I wonder if I'm the only one. Seems like failure to allocate the default qdisc is pretty critical, the log message may be missed, especially in the boot time noise. I think a WARN_ON() is in order here, I'd personally just replace the netdev_info with a WARN_ON, without the fallback.It is good that we agree that failure to default qdisc is pretty critical. I guess we disagree on whether (1) we keep network functioning in a degraded state, (2) drop all packets on net_device such that people notice. This change propose (1) keeping the box functioning. For me it was a pretty bad experience, that when I pushed a new kernel over the network to my embedded box, then I lost all network connectivity. I fortunately had serial console access (as this was not an OpenWRT box but a full devel board) so I could debug, but I could no-longer upgrade the kernel. I clearly noticed, as the box was not operational, but I guess most people would just give up at this point. (Imagine a small OpenWRT box config setting default_qdisc to fq_codel, which brick the box as it cannot allocate memory). I hope that people will notice this degrade state, when they start to transfer data to the device. Because running 'noqueue' on a physical device will result in net_crit_ratelimited() messages below: [86971.609318] Virtual device eth0 asks to queue packet! [86971.622183] Virtual device eth0 asks to queue packet! [86971.627510] Virtual device eth0 asks to queue packet!
Both ways have advantages, I guess. I don't feel strongly, but I do think that WARN_ON() is in order here.