Re: [PATCH v3 00/14] KVM: s390: pv: implement lazy destroy

From: Claudio Imbrenda <imbrenda@linux.ibm.com>
Date: 2021-08-06 09:35:04
Also in: kvm, linux-s390, lkml

On Fri, 6 Aug 2021 09:10:28 +0200
David Hildenbrand [off-list ref] wrote:

On 04.08.21 17:40, Claudio Imbrenda wrote:

quoted

Previously, when a protected VM was rebooted or when it was shut
down, its memory was made unprotected, and then the protected VM
itself was destroyed. Looping over the whole address space can take
some time, considering the overhead of the various Ultravisor Calls
(UVCs). This means that a reboot or a shutdown would take a
potentially long amount of time, depending on the amount of used
memory.

This patchseries implements a deferred destroy mechanism for
protected guests. When a protected guest is destroyed, its memory
is cleared in background, allowing the guest to restart or
terminate significantly faster than before.

There are 2 possibilities when a protected VM is torn down:
* it still has an address space associated (reboot case)
* it does not have an address space anymore (shutdown case)

For the reboot case, the reference count of the mm is increased, and
then a background thread is started to clean up. Once the thread
went through the whole address space, the protected VM is actually
destroyed.

That doesn't sound too hacky to me, and actually sounds like a good 
idea, doing what the guest would do either way but speeding it up 
asynchronously, but ...

quoted

For the shutdown case, a list of pages to be destroyed is formed
when the mm is torn down. Instead of just unmapping the pages when
the address space is being torn down, they are also set aside.
Later when KVM cleans up the VM, a thread is started to clean up
the pages from the list.

... this ...

quoted

This means that the same address space can have memory belonging to
more than one protected guest, although only one will be running,
the others will in fact not even have any CPUs.

... this ...

this ^ is exactly the reboot case.

quoted

When a guest is destroyed, its memory still counts towards its
memory control group until it's actually freed (I tested this
experimentally)

When the system runs out of memory, if a guest has terminated and
its memory is being cleaned asynchronously, the OOM killer will
wait a little and then see if memory has been freed. This has the
practical effect of slowing down memory allocations when the system
is out of memory to give the cleanup thread time to cleanup and
free memory, and avoid an actual OOM situation.

... and this sound like the kind of arch MM hacks that will bite us
in the long run. Of course, I might be wrong, but already doing
excessive GFP_ATOMIC allocations or messing with the OOM killer that

they are GFP_ATOMIC but they should not put too much weight on the
memory and can also fail without consequences, I used:

GFP_ATOMIC | __GFP_NOMEMALLOC | __GFP_NOWARN

also notice that after every page allocation a page gets freed, so this
is only temporary.

I would not call it "messing with the OOM killer", I'm using the same
interface used by virtio-baloon

way for a pure (shutdown) optimization is an alarm signal. Of course,
I might be wrong.

You should at least CC linux-mm. I'll do that right now and also CC 
Michal. He might have time to have a quick glimpse at patch #11 and
#13.

https://lkml.kernel.org/r/20210804154046.88552-12-imbrenda@linux.ibm.com
https://lkml.kernel.org/r/20210804154046.88552-14-imbrenda@linux.ibm.com

IMHO, we should proceed with patch 1-10, as they solve a really 
important problem ("slow reboots") in a nice way, whereby patch 11 
handles a case that can be worked around comparatively easily by 
management tools -- my 2 cents.

how would management tools work around the issue that a shutdown can
take very long?

also, without my patches, the shutdown case would use export instead of
destroy, making it even slower.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help