Re: [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy

From: Michal Hocko <mhocko@suse.com>
Date: 2021-03-25 17:51:35
Also in: cgroups, lkml, netdev

On Thu 25-03-21 12:47:04, Johannes Weiner wrote:

On Thu, Mar 25, 2021 at 10:02:28AM +0100, Michal Hocko wrote:

quoted

On Wed 24-03-21 15:49:15, Arjun Roy wrote:

quoted

On Wed, Mar 24, 2021 at 2:24 PM Johannes Weiner [off-list ref] wrote:

quoted

On Wed, Mar 24, 2021 at 10:12:46AM +0100, Michal Hocko wrote:

quoted

On Tue 23-03-21 11:47:54, Arjun Roy wrote:

quoted

On Tue, Mar 23, 2021 at 7:34 AM Michal Hocko [off-list ref] wrote:

quoted

On Wed 17-03-21 18:12:55, Johannes Weiner wrote:
[...]

quoted

Here is an idea of how it could work:

struct page already has

                struct {        /* page_pool used by netstack */
                        /**
                         * @dma_addr: might require a 64-bit value even on
                         * 32-bit architectures.
                         */
                        dma_addr_t dma_addr;
                };

and as you can see from its union neighbors, there is quite a bit more
room to store private data necessary for the page pool.

When a page's refcount hits zero and it's a networking page, we can
feed it back to the page pool instead of the page allocator.

From a first look, we should be able to use the PG_owner_priv_1 page
flag for network pages (see how this flag is overloaded, we can add a
PG_network alias). With this, we can identify the page in __put_page()
and __release_page(). These functions are already aware of different
types of pages and do their respective cleanup handling. We can
similarly make network a first-class citizen and hand pages back to
the network allocator from in there.

For compound pages we have a concept of destructors. Maybe we can extend
that for order-0 pages as well. The struct page is heavily packed and
compound_dtor shares the storage without other metadata
                                        int    pages;    /*    16     4 */
                        unsigned char compound_dtor;     /*    16     1 */
                        atomic_t   hpage_pinned_refcount; /*    16     4 */
                        pgtable_t  pmd_huge_pte;         /*    16     8 */
                        void *     zone_device_data;     /*    16     8 */

But none of those should really require to be valid when a page is freed
unless I am missing something. It would really require to check their
users whether they can leave the state behind. But if we can establish a
contract that compound_dtor can be always valid when a page is freed
this would be really a nice and useful abstraction because you wouldn't
have to care about the specific type of page.

Yeah technically nobody should leave these fields behind, but it
sounds pretty awkward to manage an overloaded destructor with a
refcounted object:

Either every put would have to check ref==1 before to see if it will
be the one to free the page, and then set up the destructor before
putting the final ref. But that means we can't support lockless
tryget() schemes like we have in the page cache with a destructor.

I do not follow the ref==1 part. I mean to use the hugetlb model where
the destructore is configured for the whole lifetime until the page is
freed back to the allocator (see below).

That only works if the destructor field doesn't overlap with a member
the page type itself doesn't want to use. Page types that do want to
use it would need to keep that field exclusive.

Right.

We couldn't use it for LRU pages e.g. because it overlaps with the
lru.next pointer.

Dang, I have completely missed this. I was looking at pahole because
struct page is unreadable in the C code but I tricked myself to only
look at offset 16. The initial set of candidate looked really
promissing. But overlapping with list_head is a deal breaker. This makes
use of dtor for most order-0 pages indeed unfeasible. Maybe dtor can be
rellocated but that is certain a rabbit hole people (rightfully) avoid
as much as possible. So you are right and going with networking specific
way is more reasonable.

[...]

So again, yes it would be nice to have generic destructors, but I just
don't see how it's practical.

just to clarify on this. I didn't really mean to use this mechanism to
all/most pages I just wanted to have PageHasDestructor rather than
PageNetwork because both would express a special nead for freeing but
that would require that the dtor would be outside of lru.

Thanks!
-- 
Michal Hocko
SUSE Labs

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help