Re: [BUG] net_device UAF: linkwatch_fire_event() calls dev_hold() after... | netdev

Re: [BUG] net_device UAF: linkwatch_fire_event() calls dev_hold() after netdev_wait_allrefs() is done

From: Jann Horn <jannh@google.com>
Date: 2022-02-01 22:46:48
Also in: lkml

On Fri, Jan 28, 2022 at 3:19 AM Jakub Kicinski [off-list ref] wrote:

Interesting..

I don't know what link_reset does, but since it turns the carrier on it
seems like something that should be flushed/canceled when the device
goes down. unregister brings the device down under rtnl_lock.

On Fri, 28 Jan 2022 02:51:24 +0100 Jann Horn wrote:

quoted

Is the bug that usbnet_disconnect() should be stopping &dev->kevent
before calling unregister_netdev()?

I'd say not this one, I think the generally agreed on semantics are that
the netdev is under users control between register and unregister, we
should not cripple it before unregister.

quoted

Or is the bug that ax88179_link_reset() doesn't take some kind of lock
and re-check that the netdev is still alive?

That'd not be an uncommon way to fix this.. taking rtnl_lock, not even
a driver lock in similar.

Ah, I found a comment with a bit of explanation on how this is
supposed to work... usbnet_stop() explains:

    /* deferred work (task, timer, softirq) must also stop.
     * can't flush_scheduled_work() until we drop rtnl (later),
     * else workers could deadlock; so make workers a NOP.
     */

And usbnet_stop() is ->ndo_stop(), which indeed runs under RTNL.

I wonder what the work items can do that'd conflict with RTNL... or is
the comment just talking about potential issues if a bunch of *other*
work items need RTNL and clog up the system_wq so that
flush_scheduled_work() blocks forever?

If it's the latter case, I guess we could instead do cancel_work_sync() and
then maybe re-run the work function's handler one more time
synchronously?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help