RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size
From: Zhang, Cathy <hidden>
Date: 2023-05-09 10:40:03
-----Original Message----- From: Paolo Abeni <pabeni@redhat.com> Sent: Tuesday, May 9, 2023 5:51 PM To: Zhang, Cathy <redacted>; edumazet@google.com; davem@davemloft.net; kuba@kernel.org Cc: Brandeburg, Jesse <redacted>; Srinivas, Suresh [off-list ref]; Chen, Tim C [off-list ref]; You, Lizhen [off-list ref]; eric.dumazet@gmail.com; netdev@vger.kernel.org Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size On Sun, 2023-05-07 at 19:08 -0700, Cathy Zhang wrote:quoted
Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible"), each TCP can forward allocate up to 2 MB of memory and tcp_memory_allocated might hit tcp memory limitation quite soon. To reduce the memory pressure, that commit keeps sk->sk_forward_alloc as small as possible, which will be less than 1 page size if SO_RESERVE_MEM is not specified. However, with commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible"), memcg charge hot paths are observed while system is stressed with a large amount of connections. That is because sk->sk_forward_alloc is too small and it's always less than truesize, sk->network handlers like tcp_rcv_established() should jump to slow path more frequently to increase sk->sk_forward_alloc. Each memory allocation will trigger memcg charge, then perf top shows the following contention paths on the busy system. 16.77% [kernel] [k] page_counter_try_charge 16.56% [kernel] [k] page_counter_cancel 15.65% [kernel] [k] try_charge_memcgI'm guessing you hit memcg limits frequently. I'm wondering if it's just a matter of tuning/reducing tcp limits in /proc/sys/net/ipv4/tcp_mem.
Hi Paolo, Do you mean hitting the limit of "--memory" which set when start container? If the memory option is not specified when init a container, cgroup2 will create a memcg without memory limitation on the system, right? We've run test without this setting, and the memcg charge hot paths also exist. It seems that /proc/sys/net/ipv4/tcp_[wr]mem is not allowed to be changed by a simple echo writing, but requires a change to /etc/sys.conf, I'm not sure if it could be changed without stopping the running application. Additionally, will this type of change bring more deeper and complex impact of network stack, compared to reclaim_threshold which is assumed to mostly affect of the memory allocation paths? Considering about this, it's decided to add the reclaim_threshold directly.
Cheers, Paolo