Thread (16 messages) 16 messages, 4 authors, 2016-09-01

RE: [PATCH v3 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration

From: Li, Liang Z <hidden>
Date: 2016-09-01 05:46:47
Also in: linux-mm, lkml, qemu-devel, virtualization

Subject: Re: [PATCH v3 kernel 0/7] Extend virtio-balloon for fast (de)inflating
& fast live migration

2016-08-08 14:35 GMT+08:00 Liang Li [off-list ref]:
quoted
This patch set contains two parts of changes to the virtio-balloon.

One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.

Another change is for speeding up live migration. By skipping process
guest's free pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's free page information in bitmap and
send it to host with the virt queue of virtio-balloon. For an idle 8GB
guest, this can help to shorten the total live migration time from
2Sec to about 500ms in the 10Gbps network environment.
I just read the slides of this feature for recent kvm forum, the cloud
providers more care about live migration downtime to avoid customers'
perception than total time, however, this feature will increase downtime
when acquire the benefit of reducing total time, maybe it will be more
acceptable if there is no downside for downtime.

Regards,
Wanpeng Li
In theory, there is no factor that will increase the downtime. There is no additional operation
and no more data copy during the stop and copy stage. But in the test, the downtime increases
and this can be reproduced. I think the busy network line maybe the reason for this. With this
 optimization, a huge amount of data is written to the socket in a shorter time, so some of the write
operation may need to wait. Without this optimization, zero page checking takes more time,
the network is not so busy.

If the guest is not an idle one, I think the gap of the downtime will not so obvious.  Anyway, the
downtime is still less than the  max_down_time set by the user.

Thanks!
Liang
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help