Re: [syzbot] [net?] WARNING in __linkwatch_sync_dev (2)
From: Stanislav Fomichev <hidden>
Date: 2025-06-13 01:09:04
Also in:
lkml
Subsystem:
bonding driver, networking drivers, the rest · Maintainers:
Jay Vosburgh, Andrew Lunn, "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds
On 06/11, syzbot wrote:
Hello, syzbot found the following issue on: HEAD commit: f09079bd04a9 Merge tag 'powerpc-6.16-2' of git://git.kerne.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=16e9260c580000 kernel config: https://syzkaller.appspot.com/x/.config?x=e24211089078d6c6 dashboard link: https://syzkaller.appspot.com/bug?extid=b8c48ea38ca27d150063 compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-f09079bd.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/ef68cb3d29a3/vmlinux-f09079bd.xz kernel image: https://storage.googleapis.com/syzbot-assets/1cc9431b9a15/bzImage-f09079bd.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+b8c48ea38ca27d150063@syzkaller.appspotmail.com ------------[ cut here ]------------ RTNL: assertion failed at ./include/net/netdev_lock.h (72) WARNING: CPU: -1 PID: 1141 at ./include/net/netdev_lock.h:72 netdev_ops_assert_locked include/net/netdev_lock.h:72 [inline] WARNING: CPU: 0 PID: 1141 at ./include/net/netdev_lock.h:72 __linkwatch_sync_dev+0x1ed/0x230 net/core/link_watch.c:279 ethtool_op_get_link+0x1d/0x70 net/ethtool/ioctl.c:63 bond_check_dev_link+0x3f9/0x710 drivers/net/bonding/bond_main.c:863 bond_miimon_inspect drivers/net/bonding/bond_main.c:2745 [inline] bond_mii_monitor+0x3c0/0x2dc0 drivers/net/bonding/bond_main.c:2967 process_one_work+0x9cf/0x1b70 kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3321 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3402 kthread+0x3c5/0x780 kernel/kthread.c:464 ret_from_fork+0x5d4/0x6f0 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK>
netdev_ops_assert_locked is called for non-ops-locked netdev and we
trigger ASSERT_RTNL case. Which is a bit misleading, but I noticed that
bond_miimon_inspect is running under rcu lock, which is not
gonna work for ops-locked devices :-/ (we want to grab instance
lock for the CHANGE notifiers).
I'm contemplating dropping rcu and doing try_lock rtnl. Looking at
commit f0c76d61779b ("bonding: refactor mii monitor"), it doesn't look
like there were issues with rtnl performance, so hopefully should be ok.
Because from my resent patches I remember this trace:
[ 3456.656261] ? ipv6_add_dev+0x370/0x620
[ 3456.660039] ipv6_find_idev+0x96/0xe0
[ 3456.660445] addrconf_add_dev+0x1e/0xa0
[ 3456.660861] addrconf_init_auto_addrs+0xb0/0x720
[ 3456.661803] addrconf_notify+0x35f/0x8d0
[ 3456.662236] notifier_call_chain+0x38/0xf0
[ 3456.662676] netdev_state_change+0x65/0x90
[ 3456.663112] linkwatch_do_dev+0x5a/0x70
Where linkwatch_do_dev (potentially called from ethtool_op_get_link and
bond_check_dev_link) might trigger ipv6 address assignment so I'm not
sure how this all supposed to work under rcu and without rtnl lock.
Tentatively (untested uncompiled):
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c4d53e8e7c15..e2c4bcdb8b1a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c@@ -2739,7 +2739,7 @@ static int bond_miimon_inspect(struct bonding *bond) ignore_updelay = true; } - bond_for_each_slave_rcu(bond, slave, iter) { + bond_for_each_slave(bond, slave, iter) { bond_propose_link_state(slave, BOND_LINK_NOCHANGE); link_state = bond_check_dev_link(bond, slave->dev, 0);
@@ -2962,35 +2962,28 @@ static void bond_mii_monitor(struct work_struct *work) if (!bond_has_slaves(bond)) goto re_arm; - rcu_read_lock(); + /* Race avoidance with bond_close cancel of workqueue */ + if (!rtnl_trylock()) { + delay = 1; + should_notify_peers = false; + goto re_arm; + } + should_notify_peers = bond_should_notify_peers(bond); commit = !!bond_miimon_inspect(bond); if (bond->send_peer_notif) { - rcu_read_unlock(); - if (rtnl_trylock()) { - bond->send_peer_notif--; - rtnl_unlock(); - } - } else { - rcu_read_unlock(); + bond->send_peer_notif--; } if (commit) { - /* Race avoidance with bond_close cancel of workqueue */ - if (!rtnl_trylock()) { - delay = 1; - should_notify_peers = false; - goto re_arm; - } - bond_for_each_slave(bond, slave, iter) { bond_commit_link_state(slave, BOND_SLAVE_NOTIFY_LATER); } bond_miimon_commit(bond); - - rtnl_unlock(); /* might sleep, hold no other locks */ } + rtnl_unlock(); /* might sleep, hold no other locks */ + re_arm: if (bond->params.miimon) queue_delayed_work(bond->wq, &bond->mii_work, delay);