Re: [PATCH v3 00/14] KVM: s390: pv: implement lazy destroy
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
Date: 2021-08-06 09:35:04
Also in:
kvm, linux-s390, lkml
On Fri, 6 Aug 2021 09:10:28 +0200 David Hildenbrand [off-list ref] wrote:
On 04.08.21 17:40, Claudio Imbrenda wrote:quoted
Previously, when a protected VM was rebooted or when it was shut down, its memory was made unprotected, and then the protected VM itself was destroyed. Looping over the whole address space can take some time, considering the overhead of the various Ultravisor Calls (UVCs). This means that a reboot or a shutdown would take a potentially long amount of time, depending on the amount of used memory. This patchseries implements a deferred destroy mechanism for protected guests. When a protected guest is destroyed, its memory is cleared in background, allowing the guest to restart or terminate significantly faster than before. There are 2 possibilities when a protected VM is torn down: * it still has an address space associated (reboot case) * it does not have an address space anymore (shutdown case) For the reboot case, the reference count of the mm is increased, and then a background thread is started to clean up. Once the thread went through the whole address space, the protected VM is actually destroyed.That doesn't sound too hacky to me, and actually sounds like a good idea, doing what the guest would do either way but speeding it up asynchronously, but ...quoted
For the shutdown case, a list of pages to be destroyed is formed when the mm is torn down. Instead of just unmapping the pages when the address space is being torn down, they are also set aside. Later when KVM cleans up the VM, a thread is started to clean up the pages from the list.... this ...quoted
This means that the same address space can have memory belonging to more than one protected guest, although only one will be running, the others will in fact not even have any CPUs.... this ...
this ^ is exactly the reboot case.
quoted
When a guest is destroyed, its memory still counts towards its memory control group until it's actually freed (I tested this experimentally) When the system runs out of memory, if a guest has terminated and its memory is being cleaned asynchronously, the OOM killer will wait a little and then see if memory has been freed. This has the practical effect of slowing down memory allocations when the system is out of memory to give the cleanup thread time to cleanup and free memory, and avoid an actual OOM situation.... and this sound like the kind of arch MM hacks that will bite us in the long run. Of course, I might be wrong, but already doing excessive GFP_ATOMIC allocations or messing with the OOM killer that
they are GFP_ATOMIC but they should not put too much weight on the memory and can also fail without consequences, I used: GFP_ATOMIC | __GFP_NOMEMALLOC | __GFP_NOWARN also notice that after every page allocation a page gets freed, so this is only temporary. I would not call it "messing with the OOM killer", I'm using the same interface used by virtio-baloon
way for a pure (shutdown) optimization is an alarm signal. Of course, I might be wrong. You should at least CC linux-mm. I'll do that right now and also CC Michal. He might have time to have a quick glimpse at patch #11 and #13. https://lkml.kernel.org/r/20210804154046.88552-12-imbrenda@linux.ibm.com https://lkml.kernel.org/r/20210804154046.88552-14-imbrenda@linux.ibm.com IMHO, we should proceed with patch 1-10, as they solve a really important problem ("slow reboots") in a nice way, whereby patch 11 handles a case that can be worked around comparatively easily by management tools -- my 2 cents.
how would management tools work around the issue that a shutdown can take very long? also, without my patches, the shutdown case would use export instead of destroy, making it even slower.