Thread (15 messages) 15 messages, 1 author, 4h ago
HOTtoday

[PATCH v1 net-next 00/14] net: Support per-netns device unregistration

From: Kuniyuki Iwashima <kuniyu@google.com>
Date: 2026-07-01 21:43:43

The biggest blocker to per-netns RTNL is netdev unregistration.

It starts within a single netns, but it can eventually involve
multiple namespaces.

There are three types of such cross-netns devices:

  1. Paired devices (e.g., netkit, veth, vxcan)
     -> Unregistering one device also deletes its peer, which
        may reside in another netns.

  2. Tunnel devices (e.g., bareudp, geneve, etc)
     -> Destroying a netns removes devices in another netns if
        their backend sockets reside in the dying netns

  3. Stacked devices (e.g., ipvlan, macvlan, etc)
     -> Removing the lower device also removes multiple upper
        devices, each of which may reside in different namespaces.

While the first two device types require at most two rtnl_net_lock()s,
the stacked type has no upper limit.  This makes it impossible to
freeze all necessary namespaces in advance.

This series introduces per-netns work, initially suggested at
NetConf 2024, to delegate the unregistration of such cross-netns
devices.

  https://netdev.bots.linux.dev/netconf/2024/kuniyu.pdf#page=62

The first half of the series wraps NETDEV_UNREGISTER (in core) with
per-netns RTNL, adds a helper for per-netns device unregistration,
and forces per-netns device unregistration in the core code when
CONFIG_DEBUG_NET_SMALL_RTNL=y.

The latter half picks out one from each type (veth, bareudp, ipvlan)
and converts them to support per-netns device unregistration,
although the operations are **still serialised under RTNL** for now.

Please note that this series focuses only on the device unregistration
paths.  For example, there are ASSERT_RTNL() left in other paths, and
Sashiko may point it out, but they are out of scope.

This is just the first step, and we need more incremental changes to
completely remove RTNL anyway.

Now, we can see that unregistering a lower device (veth0 below)
removes upper devices (ipvl2, ipvl3) in different namespaces using
per-netns work with a different PID.  The lower device (veth0) is
freed only after all upper ipvlan devices have called netdev_put()
in ipvlan_uninit().

  # ip netns add ns1
  # ip netns add ns2
  # ip netns add ns3
  # ip -n ns1 link add veth0 type veth peer veth1
  # ip -n ns2 link add ipvl2 link veth0 link-netns ns1 type ipvlan mode l2
  # ip -n ns3 link add ipvl3 link veth0 link-netns ns1 type ipvlan mode l2
  # ip -n ns1 link del veth0

  # bpftrace -e '#include <linux/netdevice.h>
  kprobe:ipvlan_uninit,
  kprobe:veth_dellink,
  kprobe:free_netdev {
      $dev = (struct net_device *)arg0;
      printf("PID: %d | DEV: %s%s\n", pid, $dev->name, kstack());
  }'

  PID: 2010 | DEV: veth0
          veth_dellink+5
          rtnl_dellink+1213
          rtnetlink_rcv_msg+1791
  ...
  PID: 440 | DEV: ipvl2
          ipvlan_uninit+5
          unregister_netdevice_many_notify+7129
          unregister_netdevice_many_net+1050
          rtnl_net_work_func+136
  ...
  PID: 440 | DEV: ipvl2
          free_netdev+5
          netdev_run_todo+4798
          process_scheduled_works+2538
  ...
  PID: 440 | DEV: ipvl3
          ipvlan_uninit+5
          unregister_netdevice_many_notify+7129
          unregister_netdevice_many_net+1050
          rtnl_net_work_func+136
          process_scheduled_works+2538
  ...
  PID: 2010 | DEV: veth0
          free_netdev+5
          netdev_run_todo+4798
          rtnl_dellink+1507
          rtnetlink_rcv_msg+1791
  ...
  PID: 440 | DEV: ipvl3
          free_netdev+5
          netdev_run_todo+4798
          process_scheduled_works+2538
  ...


Kuniyuki Iwashima (14):
  rtnetlink: Lock sock_net(skb->sk) in rtnl_newlink().
  rtnetlink: Call unregister_netdevice_many() only once in
    rtnl_link_unregister().
  rtnetlink: Add per-netns rtnl_work.
  net: Wrap default_device_exit_net() with __rtnl_net_lock().
  net: Hold __rtnl_net_lock() in netdev_wait_allrefs_any().
  net: Add per-netns netdev unregistration infra.
  net: Call unregister_netdevice_many() per netns.
  veth: Support per-netns device unregistration.
  bareudp: Protect bareudp_list with mutex.
  bareudp: Support per-netns netdev unregistration.
  ipvlan: Convert ipvl_port.count to refcount_t.
  ipvlan: Synchronise ipvlan_init() and ipvlan_uninit() for the same
    lower dev.
  ipvlan: Protect ipvl_port.ipvlans with mutex.
  ipvlan: Support per-netns netdev unregistration.

 drivers/net/bareudp.c            |  43 ++++++++-
 drivers/net/ipvlan/ipvlan.h      |  18 +++-
 drivers/net/ipvlan/ipvlan_main.c | 153 +++++++++++++++++++++++++------
 drivers/net/ipvlan/ipvtap.c      |  16 ++--
 drivers/net/veth.c               |  34 ++++---
 include/linux/netdevice.h        |  22 +++++
 include/linux/rtnetlink.h        |   8 ++
 include/net/net_namespace.h      |   3 +
 net/core/dev.c                   | 129 +++++++++++++++++++++++++-
 net/core/net_namespace.c         |   4 +
 net/core/rtnetlink.c             |  57 ++++++++++--
 11 files changed, 418 insertions(+), 69 deletions(-)

-- 
2.55.0.rc0.799.gd6f94ed593-goog
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help