Thread (12 messages) 12 messages, 3 authors, 2022-01-28

Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()

From: Jann Horn <jannh@google.com>
Date: 2022-01-28 17:58:39
Also in: lkml

On Fri, Jan 28, 2022 at 3:25 AM Eric Dumazet [off-list ref] wrote:
On Thu, Jan 27, 2022 at 6:14 PM Jann Horn [off-list ref] wrote:
quoted
On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet [off-list ref] wrote:
quoted
On Thu, Jan 27, 2022 at 5:43 PM Jann Horn [off-list ref] wrote:
quoted
I've run into a bug where dev_hold() was being called after
netdev_wait_allrefs(). But at that point, the device is already going
away, and dev_hold() can't stop that anymore.

To make such problems easier to diagnose in the future:

 - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether
   the net refcount has been elevated. If this is detected, WARN() and
   leak the object (to prevent worse consequences from a
   use-after-free).
 - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero.
   This signals to the generic refcount infrastructure that any attempt
   to increment the refcount later is a bug.
[...]
quoted
        if (dev->reg_state == NETREG_UNREGISTERING) {
                ASSERT_RTNL();
                dev->needs_free_netdev = true;
                return;
        }

        /* Recheck in case someone called dev_hold() between
         * netdev_wait_allrefs() and here.
         */
        if (WARN_ON(netdev_refcnt_read(dev) != 0))
                return; /* leak memory, otherwise we might get UAF */

        netif_free_tx_queues(dev);
        netif_free_rx_queues(dev);
Maybe another solution would be to leverage the recent dev_hold_track().

We could add a  dead boolean to 'struct  ref_tracker_dir ' (dev->refcnt_tracker)
Hmm... actually, what even are the semantics of dev_hold()?

Normal refcounts have the property that if you hold one reference,
you're always allowed to add another reference. But from what I can
tell, something like this:

struct net_device *dev = dev_get_by_name(net, name);
dev_hold(dev);
dev_put(dev);
dev_put(dev);

would be buggy using the current CONFIG_PCPU_DEV_REFCNT implementation.
Basically, if dev_hold() runs at the same time as
netdev_refcnt_read(), it's a bug because netdev_refcnt_read() is
non-atomic, and we could get the following race:

task B: starts netdev_refcnt_read()
task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 0)
task A, on CPU 0: dev_hold(dev) increments *per_cpu_ptr(dev->pcpu_refcnt, 0)
task A: migrates from CPU 0 to CPU 1
task A, on CPU 1: dev_put(dev) decrements *per_cpu_ptr(dev->pcpu_refcnt, 1)
task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 1)

which would make task B miss one outstanding reference.

(This is why the generic percpu refcounting code in
lib/percpu-refcount.c has logic for switching the refcount to atomic
mode with an RCU grace period.)


If these are the intended semantics for dev_hold(), then I guess your
approach of adding a new boolean flag somewhere is the right one - but
we should be setting that flag *before* waiting for the refcount to
drop to 1. Though maybe it shouldn't be in ref-tracker, since this is
a peculiarity of the hand-rolled netdev refcount...

Are these the intended semantics (and I should rewrite the patch to
also catch dev_hold() racing with netdev_wait_allrefs()), or is this
unintended (and the netdev refcount should be replaced)?

This should probably be documented...
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help