Thread (7 messages) 7 messages, 3 authors, 2025-08-02

Re: [syzbot] [net?] WARNING in __linkwatch_sync_dev (2)

From: Stanislav Fomichev <hidden>
Date: 2025-06-13 01:09:04
Also in: lkml
Subsystem: bonding driver, networking drivers, the rest · Maintainers: Jay Vosburgh, Andrew Lunn, "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds

On 06/11, syzbot wrote:
Hello,

syzbot found the following issue on:

HEAD commit:    f09079bd04a9 Merge tag 'powerpc-6.16-2' of git://git.kerne..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16e9260c580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e24211089078d6c6
dashboard link: https://syzkaller.appspot.com/bug?extid=b8c48ea38ca27d150063
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-f09079bd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/ef68cb3d29a3/vmlinux-f09079bd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/1cc9431b9a15/bzImage-f09079bd.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b8c48ea38ca27d150063@syzkaller.appspotmail.com

------------[ cut here ]------------
RTNL: assertion failed at ./include/net/netdev_lock.h (72)
WARNING: CPU: -1 PID: 1141 at ./include/net/netdev_lock.h:72 netdev_ops_assert_locked include/net/netdev_lock.h:72 [inline]
WARNING: CPU: 0 PID: 1141 at ./include/net/netdev_lock.h:72 __linkwatch_sync_dev+0x1ed/0x230 net/core/link_watch.c:279
 ethtool_op_get_link+0x1d/0x70 net/ethtool/ioctl.c:63
 bond_check_dev_link+0x3f9/0x710 drivers/net/bonding/bond_main.c:863
 bond_miimon_inspect drivers/net/bonding/bond_main.c:2745 [inline]
 bond_mii_monitor+0x3c0/0x2dc0 drivers/net/bonding/bond_main.c:2967
 process_one_work+0x9cf/0x1b70 kernel/workqueue.c:3238
 process_scheduled_works kernel/workqueue.c:3321 [inline]
 worker_thread+0x6c8/0xf10 kernel/workqueue.c:3402
 kthread+0x3c5/0x780 kernel/kthread.c:464
 ret_from_fork+0x5d4/0x6f0 arch/x86/kernel/process.c:148
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
netdev_ops_assert_locked is called for non-ops-locked netdev and we
trigger ASSERT_RTNL case. Which is a bit misleading, but I noticed that
bond_miimon_inspect is running under rcu lock, which is not
gonna work for ops-locked devices :-/ (we want to grab instance
lock for the CHANGE notifiers).

I'm contemplating dropping rcu and doing try_lock rtnl. Looking at
commit f0c76d61779b ("bonding: refactor mii monitor"), it doesn't look
like there were issues with rtnl performance, so hopefully should be ok.

Because from my resent patches I remember this trace:

    [ 3456.656261]  ? ipv6_add_dev+0x370/0x620
    [ 3456.660039]  ipv6_find_idev+0x96/0xe0
    [ 3456.660445]  addrconf_add_dev+0x1e/0xa0
    [ 3456.660861]  addrconf_init_auto_addrs+0xb0/0x720
    [ 3456.661803]  addrconf_notify+0x35f/0x8d0
    [ 3456.662236]  notifier_call_chain+0x38/0xf0
    [ 3456.662676]  netdev_state_change+0x65/0x90
    [ 3456.663112]  linkwatch_do_dev+0x5a/0x70

Where linkwatch_do_dev (potentially called from ethtool_op_get_link and
bond_check_dev_link) might trigger ipv6 address assignment so I'm not
sure how this all supposed to work under rcu and without rtnl lock.

Tentatively (untested uncompiled):
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c4d53e8e7c15..e2c4bcdb8b1a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2739,7 +2739,7 @@ static int bond_miimon_inspect(struct bonding *bond)
 			ignore_updelay = true;
 	}
 
-	bond_for_each_slave_rcu(bond, slave, iter) {
+	bond_for_each_slave(bond, slave, iter) {
 		bond_propose_link_state(slave, BOND_LINK_NOCHANGE);
 
 		link_state = bond_check_dev_link(bond, slave->dev, 0);
@@ -2962,35 +2962,28 @@ static void bond_mii_monitor(struct work_struct *work)
 	if (!bond_has_slaves(bond))
 		goto re_arm;
 
-	rcu_read_lock();
+	/* Race avoidance with bond_close cancel of workqueue */
+	if (!rtnl_trylock()) {
+		delay = 1;
+		should_notify_peers = false;
+		goto re_arm;
+	}
+
 	should_notify_peers = bond_should_notify_peers(bond);
 	commit = !!bond_miimon_inspect(bond);
 	if (bond->send_peer_notif) {
-		rcu_read_unlock();
-		if (rtnl_trylock()) {
-			bond->send_peer_notif--;
-			rtnl_unlock();
-		}
-	} else {
-		rcu_read_unlock();
+		bond->send_peer_notif--;
 	}
 
 	if (commit) {
-		/* Race avoidance with bond_close cancel of workqueue */
-		if (!rtnl_trylock()) {
-			delay = 1;
-			should_notify_peers = false;
-			goto re_arm;
-		}
-
 		bond_for_each_slave(bond, slave, iter) {
 			bond_commit_link_state(slave, BOND_SLAVE_NOTIFY_LATER);
 		}
 		bond_miimon_commit(bond);
-
-		rtnl_unlock();	/* might sleep, hold no other locks */
 	}
 
+	rtnl_unlock();	/* might sleep, hold no other locks */
+
 re_arm:
 	if (bond->params.miimon)
 		queue_delayed_work(bond->wq, &bond->mii_work, delay);
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help