Re: [PATCH 1/5] mm: Add support for unaccepted memory
From: Kirill A. Shutemov <hidden>
Date: 2021-08-12 21:08:38
Also in:
linux-coco, lkml
On Tue, Aug 10, 2021 at 01:50:57PM -0700, Dave Hansen wrote:
On 8/10/21 11:13 AM, Dave Hansen wrote:quoted
quoted
@@ -1001,6 +1004,9 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, if (page_reported(page)) __ClearPageReported(page); + if (PageOffline(page)) + clear_page_offline(page, order); + list_del(&page->lru); __ClearPageBuddy(page); set_page_private(page, 0);So, this is right in the fast path of the page allocator. It's a one-time thing per 2M page, so it's not permanent. *But* there's both a global spinlock and a firmware call hidden in clear_page_offline(). That's *GOT* to hurt if you were, for instance, running a benchmark while this code path is being tickled. Not just to That could be just downright catastrophic for scalability, albeit temporarily.One more thing... How long are these calls? You have to make at least 512 calls into the SEAM module. Assuming they're syscall-ish, so ~1,000 cycles each, that's ~500,000 cycles, even if we ignore the actual time it takes to zero that 2MB worth of memory and all other overhead within the SEAM module.
I hope to get away with 2 calls per 2M: one MapGPA and one TDACCEPTPAGE (or 3 for MAXORDER -- 4M -- pages). I don't have any numbers yet.
So, we're sitting on one CPU with interrupts off, blocking all the other CPUs from doing page allocation in this zone.
I agree that's not good. Let's see if it's going to be okay with accepting in 2M chunks.
Then, we're holding a global lock which prevents any other NUMA nodes from accepting pages.
Looking at this again, the global lock is aviodable: the caller owns the pfn range so nobody can touch these bits in the bitmap. We can replace bitmap_clear() with atomic clear_bit() loop and drop the lock completely.
If the other node happens to *try* to do an accept, it will sit with its zone lock held waiting for this one.
Maybe nobody will ever notice. But, it seems like an awfully big risk to me. I'd at least *try* do these calls outside of the zone lock. Then the collateral damage will at least be limited to things doing accepts rather than all zone->lock users. Couldn't we delay the acceptance to, say the place where we've dropped the zone->lock and do the __GFP_ZERO memset() like at prep_new_page()? Or is there some concern that the page has been split at that point?
It *will* be split by the point. Like if you ask for order-0 page and you
don't any left page allocator will try higher orders until finds anything.
On order-9 it would hit unaccepted. At that point the page going to split
and put on the free lists accordingly. That's all happens under zone lock.
__rmqueue_smallest ->
del_page_from_free_list()
expand()
I guess that makes it more complicated because you might have a 4k page but you need to go accept a 2M page. You might end up having to check the bitmap 511 more times because you might see 511 more PageOffline() pages come through. You shouldn't even need the bitmap lock to read since it's a one-way trip from unaccepted->accepted.
Yeah. Unless we don't want to flip it back on making the range share. I think we do. Otherwise it will cause problems for kexec. -- Kirill A. Shutemov