Re: [PATCH v2 2/3] mm: page_counter: rearrange struct page_counter fields

[PATCH v2 0/3] memcg: optimize charge codepath · Shakeel Butt <hidden> · 2022-08-25
[PATCH v2 1/3] mm: page_counter: remove unneeded atomic ops for low/min · Shakeel Butt <hidden> · 2022-08-25
Re: [PATCH v2 1/3] mm: page_counter: remove unneeded atomic ops for low/min · Michal Hocko <mhocko@suse.com> · 2022-08-25
[PATCH v2 2/3] mm: page_counter: rearrange struct page_counter fields · Shakeel Butt <hidden> · 2022-08-25
Re: [PATCH v2 2/3] mm: page_counter: rearrange struct page_counter fields · Andrew Morton <akpm@linux-foundation.org> · 2022-08-25
Re: [PATCH v2 2/3] mm: page_counter: rearrange struct page_counter fields · Shakeel Butt <hidden> · 2022-08-25
Re: [PATCH v2 2/3] mm: page_counter: rearrange struct page_counter fields · Andrew Morton <akpm@linux-foundation.org> · 2022-08-25
Re: [PATCH v2 2/3] mm: page_counter: rearrange struct page_counter fields · Shakeel Butt <hidden> · 2022-08-25
Re: [PATCH v2 2/3] mm: page_counter: rearrange struct page_counter fields · Michal Hocko <mhocko@suse.com> · 2022-08-25
Re: [PATCH v2 2/3] mm: page_counter: rearrange struct page_counter fields · Shakeel Butt <hidden> · 2022-08-25
[PATCH v2 3/3] memcg: increase MEMCG_CHARGE_BATCH to 64 · Shakeel Butt <hidden> · 2022-08-25
Re: [PATCH v2 3/3] memcg: increase MEMCG_CHARGE_BATCH to 64 · Michal Hocko <mhocko@suse.com> · 2022-08-25
Re: [PATCH v2 3/3] memcg: increase MEMCG_CHARGE_BATCH to 64 · Muchun Song <muchun.song@linux.dev> · 2022-08-25

From: Andrew Morton <akpm@linux-foundation.org>
Date: 2022-08-25 00:33:38
Also in: cgroups, linux-mm, lkml, oe-lkp

On Thu, 25 Aug 2022 00:05:05 +0000 Shakeel Butt [off-list ref] wrote:

With memcg v2 enabled, memcg->memory.usage is a very hot member for
the workloads doing memcg charging on multiple CPUs concurrently.
Particularly the network intensive workloads. In addition, there is a
false cache sharing between memory.usage and memory.high on the charge
path. This patch moves the usage into a separate cacheline and move all
the read most fields into separate cacheline.

To evaluate the impact of this optimization, on a 72 CPUs machine, we
ran the following workload in a three level of cgroup hierarchy.

 $ netserver -6
 # 36 instances of netperf with following params
 $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K

Results (average throughput of netperf):
Without (6.0-rc1)	10482.7 Mbps
With patch		12413.7 Mbps (18.4% improvement)

With the patch, the throughput improved by 18.4%.

One side-effect of this patch is the increase in the size of struct
mem_cgroup. For example with this patch on 64 bit build, the size of
struct mem_cgroup increased from 4032 bytes to 4416 bytes. However for
the performance improvement, this additional size is worth it. In
addition there are opportunities to reduce the size of struct
mem_cgroup like deprecation of kmem and tcpmem page counters and
better packing.

Did you evaluate the effects of using a per-cpu counter of some form?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help