Thread (49 messages) 49 messages, 6 authors, 2021-08-19

Re: [PATCH 0/5] x86: Impplement support for unaccepted memory

From: Kirill A. Shutemov <hidden>
Date: 2021-08-10 17:31:56
Also in: linux-mm, lkml

On Tue, Aug 10, 2021 at 08:51:01AM -0700, Dave Hansen wrote:
In other words, I buy the boot speed argument.  But, I don't buy the
"this saves memory long term" argument at all.
Okay, that's a fair enough. I guess there's *some* workloads that may
have memory footprint reduced, but I agree it's minority.
quoted
quoted
I had expected this series, but I also expected it to be connected to
CONFIG_DEFERRED_STRUCT_PAGE_INIT somehow.  Could you explain a bit how
this problem is different and demands a totally orthogonal solution?

For instance, what prevents us from declaring: "Memory is accepted at
the time that its 'struct page' is initialized" ?  Then, we use all the
infrastructure we already have for DEFERRED_STRUCT_PAGE_INIT.
That was my first thought too and I tried it just to realize that it is
not what we want. If we would accept page on page struct init it means we
would make host allocate all memory assigned to the guest on boot even if
guest actually use small portion of it.

Also deferred page init only allows to scale memory accept across multiple
CPUs, but doesn't allow to get to userspace before we done with it. See
wait_for_completion(&pgdat_init_all_done_comp).
That's good information.  It's a refinement of the "I want to boot
faster" requirement.  What you want is not just going _faster_, but
being able to run userspace before full acceptance has completed.

Would you be able to quantify how fast TDX page acceptance is?  Are we
talking about MB/s, GB/s, TB/s?  This series is rather bereft of numbers
for a feature which making a performance claim.

Let's say we have a 128GB VM.  How much does faster does this approach
reach userspace than if all memory was accepted up front?  How much
memory _could_ have been accepted at the point userspace starts running?
Acceptance code is not optimized yet: we accept memory in 4k chunk which
is very slow because hypercall overhead dominates the picture.

As of now, kernel boot time of 1 VCPU and 64TiB VM with upfront memory
accept is >20 times slower than with this lazy memory accept approach.

The difference is going to be substantially lower once we get it optimized
properly.

-- 
 Kirill A. Shutemov
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help