On 10/1/19 8:48 AM, John Ousterhout wrote:
On Mon, Sep 30, 2019 at 6:53 PM Eric Dumazet [off-list ref] wrote:
quoted
On 9/30/19 5:41 PM, John Ousterhout wrote:
quoted
On Mon, Sep 30, 2019 at 5:14 PM Eric Dumazet [off-list ref] wrote:
quoted
On 9/30/19 4:58 PM, John Ousterhout wrote:
quoted
As of 4.16.10, it appears to me that sk->sk_backlog_len does not
provide an accurate estimate of backlog length; this reduces the
usefulness of the "limit" argument to sk_add_backlog.
The problem is that, under heavy load, sk->sk_backlog_len can grow
arbitrarily large, even though the actual amount of data in the
backlog is small. This happens because __release_sock doesn't reset
the backlog length until it gets completely caught up. Under heavy
load, new packets can be arriving continuously into the backlog
(which increases sk_backlog.len) while other packets are being
serviced. This can go on forever, so sk_backlog.len never gets reset
and it can become arbitrarily large.
Certainly not.
It can not grow arbitrarily large, unless a backport gone wrong maybe.
Can you help me understand what would limit the growth of this value?
Suppose that new packets are arriving as quickly as they are
processed. Every time __release_sock calls sk_backlog_rcv, a new
packet arrives during the call, which is added to the backlog,
incrementing sk_backlog.len. However, sk_backlog_len doesn't get
decreased when sk_backlog_rcv completes, since the backlog hasn't
emptied (as you said, it's not "safe"). As a result, sk_backlog.len
has increased, but the actual backlog length is unchanged (one packet
was added, one was removed). Why can't this process repeat
indefinitely, until eventually sk_backlog.len reaches whatever limit
the transport specifies when it invokes sk_add_backlog? At this point
packets will be dropped by the transport even though the backlog isn't
actually very large.
The process is bounded by socket sk_rcvbuf + sk_sndbuf
bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
{
u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;
...
if (unlikely(sk_add_backlog(sk, skb, limit))) {
...
__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP);
...
}
Once the limit is reached, sk_backlog.len wont be touched, unless __release_sock()
has processed the whole queue.
Sorry if I'm missing something obvious here, but when you say
"sk_backlog.len won't be touched", doesn't that mean that incoming
packets will have to be dropped?
Yes packets are dropped if the socket has exhausted its memory budget.
Presumably the sender is trying to fool us.
And can't this occur even though the
true size of the backlog might be way less than sk_rcvbuf + sk_sndbuf,
as I described above? It seems to me that the basic problem is that
sk_backlog.len could exceed any given limit, even though there aren't
actually that many bytes still left in the backlog.
Sorry, I have no idea what is the problem you see.