Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()

From: Jann Horn <jannh@google.com>
Date: 2022-01-28 17:58:39
Also in: lkml

On Fri, Jan 28, 2022 at 3:25 AM Eric Dumazet [off-list ref] wrote:

On Thu, Jan 27, 2022 at 6:14 PM Jann Horn [off-list ref] wrote:

quoted

On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet [off-list ref] wrote:

quoted

On Thu, Jan 27, 2022 at 5:43 PM Jann Horn [off-list ref] wrote:

quoted

I've run into a bug where dev_hold() was being called after
netdev_wait_allrefs(). But at that point, the device is already going
away, and dev_hold() can't stop that anymore.

To make such problems easier to diagnose in the future:

 - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether
   the net refcount has been elevated. If this is detected, WARN() and
   leak the object (to prevent worse consequences from a
   use-after-free).
 - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero.
   This signals to the generic refcount infrastructure that any attempt
   to increment the refcount later is a bug.

[...]

quoted

        if (dev->reg_state == NETREG_UNREGISTERING) {
                ASSERT_RTNL();
                dev->needs_free_netdev = true;
                return;
        }

        /* Recheck in case someone called dev_hold() between
         * netdev_wait_allrefs() and here.
         */
        if (WARN_ON(netdev_refcnt_read(dev) != 0))
                return; /* leak memory, otherwise we might get UAF */

        netif_free_tx_queues(dev);
        netif_free_rx_queues(dev);

Maybe another solution would be to leverage the recent dev_hold_track().

We could add a  dead boolean to 'struct  ref_tracker_dir ' (dev->refcnt_tracker)

Hmm... actually, what even are the semantics of dev_hold()?

Normal refcounts have the property that if you hold one reference,
you're always allowed to add another reference. But from what I can
tell, something like this:

struct net_device *dev = dev_get_by_name(net, name);
dev_hold(dev);
dev_put(dev);
dev_put(dev);

would be buggy using the current CONFIG_PCPU_DEV_REFCNT implementation.
Basically, if dev_hold() runs at the same time as
netdev_refcnt_read(), it's a bug because netdev_refcnt_read() is
non-atomic, and we could get the following race:

task B: starts netdev_refcnt_read()
task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 0)
task A, on CPU 0: dev_hold(dev) increments *per_cpu_ptr(dev->pcpu_refcnt, 0)
task A: migrates from CPU 0 to CPU 1
task A, on CPU 1: dev_put(dev) decrements *per_cpu_ptr(dev->pcpu_refcnt, 1)
task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 1)

which would make task B miss one outstanding reference.

(This is why the generic percpu refcounting code in
lib/percpu-refcount.c has logic for switching the refcount to atomic
mode with an RCU grace period.)


If these are the intended semantics for dev_hold(), then I guess your
approach of adding a new boolean flag somewhere is the right one - but
we should be setting that flag *before* waiting for the refcount to
drop to 1. Though maybe it shouldn't be in ref-tracker, since this is
a peculiarity of the hand-rolled netdev refcount...

Are these the intended semantics (and I should rewrite the patch to
also catch dev_hold() racing with netdev_wait_allrefs()), or is this
unintended (and the netdev refcount should be replaced)?

This should probably be documented...

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help