Re: [RFC PATCH net-next] sock: Propose socket.urgent for sockmem isolation
From: Shakeel Butt <hidden>
Date: 2023-06-09 17:54:13
Also in:
cgroups, linux-mm, lkml
On Fri, Jun 9, 2023 at 2:07 PM Eric Dumazet [off-list ref] wrote:
On Fri, Jun 9, 2023 at 10:28 AM Abel Wu [off-list ref] wrote:quoted
This is just a PoC patch intended to resume the discussion about tcpmem isolation opened by Google in LPC'22 [1]. We are facing the same problem that the global shared threshold can cause isolation issues. Low priority jobs can hog TCP memory and adversely impact higher priority jobs. What's worse is that these low priority jobs usually have smaller cpu weights leading to poor ability to consume rx data. To tackle this problem, an interface for non-root cgroup memory controller named 'socket.urgent' is proposed. It determines whether the sockets of this cgroup and its descendants can escape from the constrains or not under global socket memory pressure. The 'urgent' semantics will not take effect under memcg pressure in order to protect against worse memstalls, thus will be the same as before without this patch. This proposal doesn't remove protocal's threshold as we found it useful in restraining memory defragment. As aforementioned the low priority jobs can hog lots of memory, which is unreclaimable and unmovable, for some time due to small cpu weight. So in practice we allow high priority jobs with net-memcg accounting enabled to escape the global constrains if the net-memcg itselt is not under pressure. While for lower priority jobs, the budget will be tightened as the memory usage of 'urgent' jobs increases. In this way we can finally achieve: - Important jobs won't be priority inversed by the background jobs in terms of socket memory pressure/limit. - Global constrains are still effective, but only on non-urgent jobs, useful for admins on policy decision on defrag. Comments/Ideas are welcomed, thanks!This seems to go in a complete opposite direction than memcg promises. Can we fix memcg, so that : Each group can use the memory it was provisioned (this includes TCP buffers) Global tcp_memory can disappear (set tcp_mem to infinity)
I agree with Eric and this is exactly how we at Google overcome the isolation issue. We have set tcp_mem to unlimited and enabled memcg accounting of network memory (by surgically incorporating v2 semantics of network memory accounting in our v1 environment). I do have one question though:
This proposal doesn't remove protocal's threshold as we found it useful in restraining memory defragment.
Can you explain how you find the global tcp limit useful? What does memory defragment mean?