Re: [PATCH RFC v2 net-next 0/5] net: Qdisc backpressure infrastructure
From: Eric Dumazet <edumazet@google.com>
Date: 2022-08-22 16:22:56
Also in:
linux-doc, lkml
On Mon, Aug 22, 2022 at 2:10 AM Peilin Ye [off-list ref] wrote:
From: Peilin Ye <redacted>
Hi all,
Currently sockets (especially UDP ones) can drop a lot of packets at TC
egress when rate limited by shaper Qdiscs like HTB. This patchset series
tries to solve this by introducing a Qdisc backpressure mechanism.
RFC v1 [1] used a throttle & unthrottle approach, which introduced several
issues, including a thundering herd problem and a socket reference count
issue [2]. This RFC v2 uses a different approach to avoid those issues:
1. When a shaper Qdisc drops a packet that belongs to a local socket due
to TC egress congestion, we make part of the socket's sndbuf
temporarily unavailable, so it sends slower.
2. Later, when TC egress becomes idle again, we gradually recover the
socket's sndbuf back to normal. Patch 2 implements this step using a
timer for UDP sockets.
The thundering herd problem is avoided, since we no longer wake up all
throttled sockets at the same time in qdisc_watchdog(). The socket
reference count issue is also avoided, since we no longer maintain socket
list on Qdisc.
Performance is better than RFC v1. There is one concern about fairness
between flows for TBF Qdisc, which could be solved by using a SFQ inner
Qdisc.
Please see the individual patches for details and numbers. Any comments,
suggestions would be much appreciated. Thanks!
[1] https://lore.kernel.org/netdev/cover.1651800598.git.peilin.ye@bytedance.com/ (local)
[2] https://lore.kernel.org/netdev/20220506133111.1d4bebf3@hermes.local/ (local)
Peilin Ye (5):
net: Introduce Qdisc backpressure infrastructure
net/udp: Implement Qdisc backpressure algorithm
net/sched: sch_tbf: Use Qdisc backpressure infrastructure
net/sched: sch_htb: Use Qdisc backpressure infrastructure
net/sched: sch_cbq: Use Qdisc backpressure infrastructureI think the whole idea is wrong. Packet schedulers can be remote (offloaded, or on another box) The idea of going back to socket level from a packet scheduler should really be a last resort. Issue of having UDP sockets being able to flood a network is tough, I am not sure the core networking stack should pretend it can solve the issue. Note that FQ based packet schedulers can also help already.