Thread (3 messages) 3 messages, 3 authors, 2021-03-22

Re: [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy

From: Arjun Roy <hidden>
Date: 2021-03-22 21:20:20
Also in: cgroups, linux-mm, lkml

On Wed, Mar 17, 2021 at 8:21 PM Andrew Morton [off-list ref] wrote:
On Mon, 15 Mar 2021 18:30:03 -0700 Arjun Roy [off-list ref] wrote:
quoted
From: Arjun Roy <redacted>

TCP zerocopy receive is used by high performance network applications
to further scale. For RX zerocopy, the memory containing the network
data filled by the network driver is directly mapped into the address
space of high performance applications. To keep the TLB cost low,
these applications unmap the network memory in big batches. So, this
memory can remain mapped for long time. This can cause a memory
isolation issue as this memory becomes unaccounted after getting
mapped into the application address space. This patch adds the memcg
accounting for such memory.

Accounting the network memory comes with its own unique challenges.
The high performance NIC drivers use page pooling to reuse the pages
to eliminate/reduce expensive setup steps like IOMMU. These drivers
keep an extra reference on the pages and thus we can not depend on the
page reference for the uncharging. The page in the pool may keep a
memcg pinned for arbitrary long time or may get used by other memcg.

This patch decouples the uncharging of the page from the refcnt and
associates it with the map count i.e. the page gets uncharged when the
last address space unmaps it. Now the question is, what if the driver
drops its reference while the page is still mapped? That is fine as
the address space also holds a reference to the page i.e. the
reference count can not drop to zero before the map count.
What tree were you hoping to get this merged through?  I'd suggest net
- it's more likely to get tested over there.
That was one part I wasn't quite sure about - the v3 patchset makes
things less clear even, since while v1/v2 are mostly mm heavy v3 would
have some significant changes in both subsystems.

I'm open to whichever is the "right" way to go, but am not currently
certain which would be.

Thanks,
-Arjun
quoted
...
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
These changes could be inside #ifdef CONFIG_NET.  Although I expect
MEMCG=y&&NET=n is pretty damn rare.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help