Re: [RFC] virtio-net: help live migrate SR-IOV devices

From: Stephen Hemminger <stephen@networkplumber.org>
Date: 2017-11-30 04:10:18

On Wed, 29 Nov 2017 19:51:38 -0800
Jakub Kicinski [off-list ref] wrote:

On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:

quoted

On 2017年11月29日 03:27, Jesse Brandeburg wrote:

quoted

Hi, I'd like to get some feedback on a proposal to enhance
virtio-net to ease configuration of a VM and that would enable
live migration of passthrough network SR-IOV devices.

Today we have SR-IOV network devices (VFs) that can be passed
into a VM in order to enable high performance networking direct
within the VM. The problem I am trying to address is that this
configuration is generally difficult to live-migrate.  There is
documentation [1] indicating that some OS/Hypervisor vendors will
support live migration of a system with a direct assigned
networking device.  The problem I see with these implementations
is that the network configuration requirements that are passed on
to the owner of the VM are quite complicated.  You have to set up
bonding, you have to configure it to enslave two interfaces,
those interfaces (one is virtio-net, the other is SR-IOV
device/driver like ixgbevf) must support MAC address changes
requested in the VM, and on and on...

So, on to the proposal:
Modify virtio-net driver to be a single VM network device that
enslaves an SR-IOV network device (inside the VM) with the same
MAC address. This would cause the virtio-net driver to appear and
work like a simplified bonding/team driver.  The live migration
problem would be solved just like today's bonding solution, but
the VM user's networking config would be greatly simplified.

At it's simplest, it would appear something like this in the VM.

==========
= vnet0  =
          =============
(virtio- =       |
  net)    =       |
          =  ==========
          =  = ixgbef =
==========  ==========

(forgive the ASCII art)

The fast path traffic would prefer the ixgbevf or other SR-IOV
device path, and fall back to virtio's transmit/receive when
migrating.

Compared to today's options this proposal would
1) make virtio-net more sticky, allow fast path traffic at SR-IOV
    speeds
2) simplify end user configuration in the VM (most if not all of
the set up to enable migration would be done in the hypervisor)
3) allow live migration via a simple link down and maybe a PCI
    hot-unplug of the SR-IOV device, with failover to the
virtio-net driver core
4) allow vendor agnostic hardware acceleration, and live migration
    between vendors if the VM os has driver support for all the
required SR-IOV devices.

Runtime operation proposed:
- <in either order> virtio-net driver loads, SR-IOV driver loads
- virtio-net finds other NICs that match it's MAC address by
   both examining existing interfaces, and sets up a new device
notifier
- virtio-net enslaves the first NIC with the same MAC address
- virtio-net brings up the slave, and makes it the "preferred"
path
- virtio-net follows the behavior of an active backup bond/team
- virtio-net acts as the interface to the VM
- live migration initiates
- link goes down on SR-IOV, or SR-IOV device is removed
- failover to virtio-net as primary path
- migration continues to new host
- new host is started with virio-net as primary
- if no SR-IOV, virtio-net stays primary
- hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
- virtio-net notices new NIC and starts over at enslave step above

Future ideas (brainstorming):
- Optimize Fast east-west by having special rules to direct
east-west traffic through virtio-net traffic path

Thanks for reading!
Jesse

Cc netdev.

Interesting, and this method is actually used by netvsc now:

commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
Author: stephen hemminger [off-list ref]
Date:   Tue Aug 1 19:58:53 2017 -0700

     netvsc: transparent VF management

     This patch implements transparent fail over from synthetic NIC
to SR-IOV virtual function NIC in Hyper-V environment. It is a
better alternative to using bonding as is done now. Instead, the
receive and transmit fail over is done internally inside the driver.

     Using bonding driver has lots of issues because it depends on
the script being run early enough in the boot process and with
sufficient information to make the association. This patch moves
all that functionality into the kernel.

     Signed-off-by: Stephen Hemminger [off-list ref]
     Signed-off-by: David S. Miller [off-list ref]

If my understanding is correct there's no need to for any extension
of virtio spec. If this is true, maybe you can start to prepare the
patch?

IMHO this is as close to policy in the kernel as one can get.  User
land has all the information it needs to instantiate that bond/team
automatically.  In fact I'm trying to discuss this with NetworkManager
folks and Red Hat right now:

https://mail.gnome.org/archives/networkmanager-list/2017-November/msg00038.html

Can we flip the argument and ask why is the kernel supposed to be
responsible for this?  It's not like we run DHCP out of the kernel
on new interfaces...

Although "policy should not be in the kernel" is a a great mantra,
it is not practical in the real world.

If you think it can be solved in userspace, then you haven't had to
deal with four different network initialization
systems, multiple orchestration systems and customers on ancient
Enterprise distributions.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help