Re: [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy
From: Michal Hocko <mhocko@suse.com>
Date: 2021-03-25 17:51:35
Also in:
cgroups, lkml, netdev
On Thu 25-03-21 12:47:04, Johannes Weiner wrote:
On Thu, Mar 25, 2021 at 10:02:28AM +0100, Michal Hocko wrote:quoted
On Wed 24-03-21 15:49:15, Arjun Roy wrote:quoted
On Wed, Mar 24, 2021 at 2:24 PM Johannes Weiner [off-list ref] wrote:quoted
On Wed, Mar 24, 2021 at 10:12:46AM +0100, Michal Hocko wrote:quoted
On Tue 23-03-21 11:47:54, Arjun Roy wrote:quoted
On Tue, Mar 23, 2021 at 7:34 AM Michal Hocko [off-list ref] wrote:quoted
On Wed 17-03-21 18:12:55, Johannes Weiner wrote: [...]quoted
Here is an idea of how it could work: struct page already has struct { /* page_pool used by netstack */ /** * @dma_addr: might require a 64-bit value even on * 32-bit architectures. */ dma_addr_t dma_addr; }; and as you can see from its union neighbors, there is quite a bit more room to store private data necessary for the page pool. When a page's refcount hits zero and it's a networking page, we can feed it back to the page pool instead of the page allocator. From a first look, we should be able to use the PG_owner_priv_1 page flag for network pages (see how this flag is overloaded, we can add a PG_network alias). With this, we can identify the page in __put_page() and __release_page(). These functions are already aware of different types of pages and do their respective cleanup handling. We can similarly make network a first-class citizen and hand pages back to the network allocator from in there.For compound pages we have a concept of destructors. Maybe we can extend that for order-0 pages as well. The struct page is heavily packed and compound_dtor shares the storage without other metadata int pages; /* 16 4 */ unsigned char compound_dtor; /* 16 1 */ atomic_t hpage_pinned_refcount; /* 16 4 */ pgtable_t pmd_huge_pte; /* 16 8 */ void * zone_device_data; /* 16 8 */ But none of those should really require to be valid when a page is freed unless I am missing something. It would really require to check their users whether they can leave the state behind. But if we can establish a contract that compound_dtor can be always valid when a page is freed this would be really a nice and useful abstraction because you wouldn't have to care about the specific type of page.Yeah technically nobody should leave these fields behind, but it sounds pretty awkward to manage an overloaded destructor with a refcounted object: Either every put would have to check ref==1 before to see if it will be the one to free the page, and then set up the destructor before putting the final ref. But that means we can't support lockless tryget() schemes like we have in the page cache with a destructor.I do not follow the ref==1 part. I mean to use the hugetlb model where the destructore is configured for the whole lifetime until the page is freed back to the allocator (see below).That only works if the destructor field doesn't overlap with a member the page type itself doesn't want to use. Page types that do want to use it would need to keep that field exclusive.
Right.
We couldn't use it for LRU pages e.g. because it overlaps with the lru.next pointer.
Dang, I have completely missed this. I was looking at pahole because struct page is unreadable in the C code but I tricked myself to only look at offset 16. The initial set of candidate looked really promissing. But overlapping with list_head is a deal breaker. This makes use of dtor for most order-0 pages indeed unfeasible. Maybe dtor can be rellocated but that is certain a rabbit hole people (rightfully) avoid as much as possible. So you are right and going with networking specific way is more reasonable. [...]
So again, yes it would be nice to have generic destructors, but I just don't see how it's practical.
just to clarify on this. I didn't really mean to use this mechanism to all/most pages I just wanted to have PageHasDestructor rather than PageNetwork because both would express a special nead for freeing but that would require that the dtor would be outside of lru. Thanks! -- Michal Hocko SUSE Labs