Thread (5 messages) 5 messages, 3 authors, 2020-05-04

Re: [PATCH net-next V2] net: sched: fallback to qdisc noqueue if default qdisc setup fail

From: Jakub Kicinski <kuba@kernel.org>
Date: 2020-05-01 19:01:38

On Fri, 1 May 2020 13:56:02 +0200 Jesper Dangaard Brouer wrote:
On Thu, 30 Apr 2020 12:45:49 -0700
Jakub Kicinski [off-list ref] wrote:
quoted
On Thu, 30 Apr 2020 13:42:22 +0200 Jesper Dangaard Brouer wrote:  
quoted
Currently if the default qdisc setup/init fails, the device ends up with
qdisc "noop", which causes all TX packets to get dropped.

With the introduction of sysctl net/core/default_qdisc it is possible
to change the default qdisc to be more advanced, which opens for the
possibility that Qdisc_ops->init() can fail.

This patch detect these kind of failures, and choose to fallback to
qdisc "noqueue", which is so simple that its init call will not fail.
This allows the interface to continue functioning.

V2:
As this also captures memory failures, which are transient, the
device is not kept in IFF_NO_QUEUE state.  This allows the net_device
to retry to default qdisc assignment.

Signed-off-by: Jesper Dangaard Brouer <redacted>    
I have mixed feelings about this one, I wonder if I'm the only one.
Seems like failure to allocate the default qdisc is pretty critical,
the log message may be missed, especially in the boot time noise.

I think a WARN_ON() is in order here, I'd personally just replace the
netdev_info with a WARN_ON, without the fallback.  
It is good that we agree that failure to default qdisc is pretty
critical.  I guess we disagree on whether (1) we keep network
functioning in a degraded state, (2) drop all packets on net_device
such that people notice.

This change propose (1) keeping the box functioning.  For me it was a
pretty bad experience, that when I pushed a new kernel over the network
to my embedded box, then I lost all network connectivity.  I
fortunately had serial console access (as this was not an OpenWRT box
but a full devel board) so I could debug, but I could no-longer upgrade
the kernel.  I clearly noticed, as the box was not operational, but I
guess most people would just give up at this point. (Imagine a small
OpenWRT box config setting default_qdisc to fq_codel, which brick the
box as it cannot allocate memory).

I hope that people will notice this degrade state, when they start to
transfer data to the device.  Because running 'noqueue' on a physical
device will result in net_crit_ratelimited() messages below:

 [86971.609318] Virtual device eth0 asks to queue packet!
 [86971.622183] Virtual device eth0 asks to queue packet!
 [86971.627510] Virtual device eth0 asks to queue packet!
Both ways have advantages, I guess. I don't feel strongly, 
but I do think that WARN_ON() is in order here.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help