Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
From: Jann Horn <jannh@google.com>
Date: 2022-01-28 17:58:39
Also in:
lkml
On Fri, Jan 28, 2022 at 3:25 AM Eric Dumazet [off-list ref] wrote:
On Thu, Jan 27, 2022 at 6:14 PM Jann Horn [off-list ref] wrote:quoted
On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet [off-list ref] wrote:quoted
On Thu, Jan 27, 2022 at 5:43 PM Jann Horn [off-list ref] wrote:quoted
I've run into a bug where dev_hold() was being called after netdev_wait_allrefs(). But at that point, the device is already going away, and dev_hold() can't stop that anymore. To make such problems easier to diagnose in the future: - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether the net refcount has been elevated. If this is detected, WARN() and leak the object (to prevent worse consequences from a use-after-free). - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero. This signals to the generic refcount infrastructure that any attempt to increment the refcount later is a bug.
[...]
quoted
if (dev->reg_state == NETREG_UNREGISTERING) { ASSERT_RTNL(); dev->needs_free_netdev = true; return; } /* Recheck in case someone called dev_hold() between * netdev_wait_allrefs() and here. */ if (WARN_ON(netdev_refcnt_read(dev) != 0)) return; /* leak memory, otherwise we might get UAF */ netif_free_tx_queues(dev); netif_free_rx_queues(dev);Maybe another solution would be to leverage the recent dev_hold_track(). We could add a dead boolean to 'struct ref_tracker_dir ' (dev->refcnt_tracker)
Hmm... actually, what even are the semantics of dev_hold()? Normal refcounts have the property that if you hold one reference, you're always allowed to add another reference. But from what I can tell, something like this: struct net_device *dev = dev_get_by_name(net, name); dev_hold(dev); dev_put(dev); dev_put(dev); would be buggy using the current CONFIG_PCPU_DEV_REFCNT implementation. Basically, if dev_hold() runs at the same time as netdev_refcnt_read(), it's a bug because netdev_refcnt_read() is non-atomic, and we could get the following race: task B: starts netdev_refcnt_read() task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 0) task A, on CPU 0: dev_hold(dev) increments *per_cpu_ptr(dev->pcpu_refcnt, 0) task A: migrates from CPU 0 to CPU 1 task A, on CPU 1: dev_put(dev) decrements *per_cpu_ptr(dev->pcpu_refcnt, 1) task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 1) which would make task B miss one outstanding reference. (This is why the generic percpu refcounting code in lib/percpu-refcount.c has logic for switching the refcount to atomic mode with an RCU grace period.) If these are the intended semantics for dev_hold(), then I guess your approach of adding a new boolean flag somewhere is the right one - but we should be setting that flag *before* waiting for the refcount to drop to 1. Though maybe it shouldn't be in ref-tracker, since this is a peculiarity of the hand-rolled netdev refcount... Are these the intended semantics (and I should rewrite the patch to also catch dev_hold() racing with netdev_wait_allrefs()), or is this unintended (and the netdev refcount should be replaced)? This should probably be documented...