Re: [PATCH 1/5] mm: Add support for unaccepted memory

[PATCH 0/5] x86: Impplement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
[PATCH 2/5] efi/x86: Implement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 2/5] efi/x86: Implement support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 2/5] efi/x86: Implement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-12
Re: [PATCH 2/5] efi/x86: Implement support for unaccepted memory · Dave Hansen <hidden> · 2021-08-12
Re: [PATCH 2/5] efi/x86: Implement support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 2/5] efi/x86: Implement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 2/5] efi/x86: Implement support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 2/5] efi/x86: Implement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-12
[PATCH 3/5] x86/boot/compressed: Handle unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
[PATCH 4/5] x86/mm: Provide helpers for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 4/5] x86/mm: Provide helpers for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 4/5] x86/mm: Provide helpers for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-12
[PATCH 1/5] mm: Add support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · David Hildenbrand <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · David Hildenbrand <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-12
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Andi Kleen <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Andi Kleen <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Andi Kleen <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Joerg Roedel <hidden> · 2021-08-12
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Dave Hansen <hidden> · 2021-08-12
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-12
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Dave Hansen <hidden> · 2021-08-12
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-12
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Joerg Roedel <hidden> · 2021-08-13
Re: [PATCH 1/5] mm: Add support for unaccepted memory · David Hildenbrand <hidden> · 2021-08-17
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Joerg Roedel <hidden> · 2021-08-19
Re: [PATCH 1/5] mm: Add support for unaccepted memory · David Hildenbrand <hidden> · 2021-08-19
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 1/5] mm: Add support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-12
[PATCH 5/5] x86/tdx: Unaccepted memory support · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Dave Hansen <hidden> · 2021-08-10
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-10
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Joerg Roedel <hidden> · 2021-08-12
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-12
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Andi Kleen <hidden> · 2021-08-12
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Kirill A. Shutemov <hidden> · 2021-08-12
Re: [PATCH 0/5] x86: Impplement support for unaccepted memory · Joerg Roedel <hidden> · 2021-08-13

From: Kirill A. Shutemov <hidden>
Date: 2021-08-12 21:08:38
Also in: linux-coco, lkml

On Tue, Aug 10, 2021 at 01:50:57PM -0700, Dave Hansen wrote:

On 8/10/21 11:13 AM, Dave Hansen wrote:

quoted

@@ -1001,6 +1004,9 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone,
 	if (page_reported(page))
 		__ClearPageReported(page);
 
+	if (PageOffline(page))
+		clear_page_offline(page, order);
+
 	list_del(&page->lru);
 	__ClearPageBuddy(page);
 	set_page_private(page, 0);

So, this is right in the fast path of the page allocator.  It's a
one-time thing per 2M page, so it's not permanent.

*But* there's both a global spinlock and a firmware call hidden in
clear_page_offline().  That's *GOT* to hurt if you were, for instance,
running a benchmark while this code path is being tickled.  Not just to

That could be just downright catastrophic for scalability, albeit
temporarily.

One more thing...

How long are these calls?  You have to make at least 512 calls into the
SEAM module.  Assuming they're syscall-ish, so ~1,000 cycles each,
that's ~500,000 cycles, even if we ignore the actual time it takes to
zero that 2MB worth of memory and all other overhead within the SEAM module.

I hope to get away with 2 calls per 2M: one MapGPA and one TDACCEPTPAGE
(or 3 for MAXORDER -- 4M -- pages). I don't have any numbers yet.

So, we're sitting on one CPU with interrupts off, blocking all the other
CPUs from doing page allocation in this zone.

I agree that's not good. Let's see if it's going to be okay with accepting
in 2M chunks.

Then, we're holding a global lock which prevents any other NUMA nodes
from accepting pages.

Looking at this again, the global lock is aviodable: the caller owns the
pfn range so nobody can touch these bits in the bitmap. We can replace
bitmap_clear() with atomic clear_bit() loop and drop the lock completely.

If the other node happens to *try* to do an
accept, it will sit with its zone lock held waiting for this one.

Maybe nobody will ever notice.  But, it seems like an awfully big risk
to me.  I'd at least *try* do these calls outside of the zone lock.
Then the collateral damage will at least be limited to things doing
accepts rather than all zone->lock users.

Couldn't we delay the acceptance to, say the place where we've dropped
the zone->lock and do the __GFP_ZERO memset() like at prep_new_page()?
Or is there some concern that the page has been split at that point?

It *will* be split by the point. Like if you ask for order-0 page and you
don't any left page allocator will try higher orders until finds anything.
On order-9 it would hit unaccepted. At that point the page going to split
and put on the free lists accordingly. That's all happens under zone lock.

  __rmqueue_smallest ->
    del_page_from_free_list()
    expand()

I guess that makes it more complicated because you might have a 4k page
but you need to go accept a 2M page.  You might end up having to check
the bitmap 511 more times because you might see 511 more PageOffline()
pages come through.

You shouldn't even need the bitmap lock to read since it's a one-way
trip from unaccepted->accepted.

Yeah. Unless we don't want to flip it back on making the range share.
I think we do. Otherwise it will cause problems for kexec.

-- 
 Kirill A. Shutemov

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help