Re: [PATCH net] bonding: Fix header_ops type confusion
From: 戸田晃太 <hidden>
Date: 2026-01-15 10:33:49
Also in:
lkml
Hello, Eric and other maintainers, I hope you’re doing well. I’m following up on our email, sent during the holiday season, in case it got buried. When you have a moment, could you please let us know if you had a chance to review it? Thank you in advance, and I look forward to your response. Best regards, Kota Toda 2025年12月22日(月) 17:20 小池悠生 [off-list ref]:
Hello, Eric and other maintainers, I'm deeply sorry to have left the patch suggestion for this long period. I became extremely busy, and that took its toll on my health, causing me to take sick leave for nearly half a year (And my colleague Kota had been waiting for me to come back). As fortunately I've recovered and returned to work, I hope to move forward with this matter as well. Recalling the issue Eric raised, I understand it was a concern about potential race conditions arising from the `bond_header_ops` and `header_slave_dev` added to the `struct bonding`. For example, one could imagine a situation where `header_slave_dev` is rewritten to a different type, and at that exact moment a function from the old `bond_header_ops` gets called, or vice versa. However, I am actually skeptical that this can happen. The reason is that `bond_setup_by_slave` is only called when there are no slaves at all:bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, struct netlink_ext_ack *extack) ... if (!bond_has_slaves(bond)) { ... if (slave_dev->type != ARPHRD_ETHER) bond_setup_by_slave(bond_dev, slave_dev);In other words, in order to trigger a race condition, one would need to remove the slave once and make the slave list empty first. However, as shown below, in `bond_release_and_destroy`, when the slave list becomes empty, it appears that the bond interface itself is removed. This makes it seem impossible to "quickly remove a slave and re-register it.":static int bond_slave_netdev_event(unsigned long event, struct net_device *slave_dev) ... switch (event) { case NETDEV_UNREGISTER: if (bond_dev->type != ARPHRD_ETHER) bond_release_and_destroy(bond_dev, slave_dev); ... } ... /* First release a slave and then destroy the bond if no more slaves are left. * Must be under rtnl_lock when this function is called. */ static int bond_release_and_destroy(struct net_device *bond_dev, struct net_device *slave_dev) { struct bonding *bond = netdev_priv(bond_dev); int ret; ret = __bond_release_one(bond_dev, slave_dev, false, true); if (ret == 0 && !bond_has_slaves(bond) && bond_dev->reg_state != NETREG_UNREGISTERING) { bond_dev->priv_flags |= IFF_DISABLE_NETPOLL; netdev_info(bond_dev, "Destroying bond\n"); bond_remove_proc_entry(bond); unregister_netdevice(bond_dev); } return ret; }Moreover, as noted in the comments, these functions are executed under the netlink-side lock. Therefore, my conclusion is that a race condition cannot actually occur. I also think that the fact that, even before our patch, these code paths had almost no explicit locking anywhere serves as circumstantial evidence for this view. As Kota said, as far as I saw, the past syzkaller-bot's report is seemingly only NULL pointer dereference due to the root cause we reported, and this patch should fix them. That said, even so, I agree that the kind of countermeasures Eric suggests are worth applying if they do not cause problems in terms of execution speed or code size. However, I am concerned that addressing this with READ_ONCE or RCU would imply a somewhat large amount of rewriting. `header_ops` is designed to allow various types of devices to be handled in an object-oriented way, and as such it is used throughout many parts of the Linux kernel. Using READ_ONCE or RCU every time header_ops is accessed simply because we are worried about a race condition in bond’s header_ops seems to imply changes like the following, for example:diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 92dc1f1788de..d9aad38746ad 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1538,7 +1538,7 @@ static void neigh_hh_init(struct neighbour *n) * hh_cache entry. */ if (!hh->hh_len) - dev->header_ops->cache(n, hh, prot); + READ_ONCE(dev->header_ops->cache)(n, hh, prot); write_unlock_bh(&n->lock); } --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3131,35 +3131,41 @@ static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev, const void *daddr, const void *saddr, unsigned int len) { - if (!dev->header_ops || !dev->header_ops->create) + int (*create)(struct sk_buff *skb, struct net_device *dev, + unsigned short type, const void *daddr, + const void *saddr, unsigned int len); + if (!dev->header_ops || !(create = READ_ONCE(dev->header_ops->create))) return 0; - return dev->header_ops->create(skb, dev, type, daddr, saddr, len); + return create(skb, dev, type, daddr, saddr, len); } static inline int dev_parse_header(const struct sk_buff *skb, unsigned char *haddr) { + int (*parse)(const struct sk_buff *skb, unsigned char *haddr); const struct net_device *dev = skb->dev; - if (!dev->header_ops || !dev->header_ops->parse) + if (!dev->header_ops || !(parse = READ_ONCE(dev->header_ops->parse))) return 0; - return dev->header_ops->parse(skb, haddr); + return parse(skb, haddr); } ... (and so on)It looks like we would end up rewriting on the order of a dozen or so places with this kind of pattern, but from the perspective of the maintainers (or in terms of Linux kernel culture), would this be considered an acceptable change? If this differs from what you intended, please correct me. Best regards, Yuki Koike 2025年5月29日(木) 0:10 Eric Dumazet [off-list ref]:quoted
On Wed, May 28, 2025 at 7:36 AM 戸田晃太 [off-list ref] wrote:quoted
Thank you for your review. 2025年5月26日(月) 17:23 Eric Dumazet [off-list ref]:quoted
On Sun, May 25, 2025 at 10:08 PM 戸田晃太 [off-list ref] wrote:quoted
In bond_setup_by_slave(), the slave’s header_ops are unconditionally copied into the bonding device. As a result, the bonding device may invoke the slave-specific header operations on itself, causing netdev_priv(bond_dev) (a struct bonding) to be incorrectly interpreted as the slave's private-data type. This type-confusion bug can lead to out-of-bounds writes into the skb, resulting in memory corruption. This patch adds two members to struct bonding, bond_header_ops and header_slave_dev, to avoid type-confusion while keeping track of the slave's header_ops. Fixes: 1284cd3a2b740 (bonding: two small fixes for IPoIB support) Signed-off-by: Kota Toda <redacted> Signed-off-by: Yuki Koike <redacted> Co-Developed-by: Yuki Koike <redacted> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reported-by: Kota Toda <redacted> --- drivers/net/bonding/bond_main.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++- include/net/bonding.h | 5 +++++ 2 files changed, 65 insertions(+), 1 deletion(-)diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 8ea183da8d53..690f3e0971d0 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c@@ -1619,14 +1619,65 @@ static void bond_compute_features(struct bonding *bond) netdev_change_features(bond_dev); } +static int bond_hard_header(struct sk_buff *skb, struct net_device *dev, + unsigned short type, const void *daddr, + const void *saddr, unsigned int len) +{ + struct bonding *bond = netdev_priv(dev); + struct net_device *slave_dev; + + slave_dev = bond->header_slave_dev; + + return dev_hard_header(skb, slave_dev, type, daddr, saddr, len); +} + +static void bond_header_cache_update(struct hh_cache *hh, conststruct net_device *dev, + const unsigned char *haddr) +{ + const struct bonding *bond = netdev_priv(dev); + struct net_device *slave_dev; + + slave_dev = bond->header_slave_dev;I do not see any barrier ?quoted
+ + if (!slave_dev->header_ops || !slave_dev->header_ops->cache_update) + return; + + slave_dev->header_ops->cache_update(hh, slave_dev, haddr); +} + static void bond_setup_by_slave(struct net_device *bond_dev, struct net_device *slave_dev) { + struct bonding *bond = netdev_priv(bond_dev); bool was_up = !!(bond_dev->flags & IFF_UP); dev_close(bond_dev); - bond_dev->header_ops = slave_dev->header_ops; + /* Some functions are given dev as an argument + * while others not. When dev is not given, we cannot + * find out what is the slave device through struct bonding + * (the private data of bond_dev). Therefore, we need a raw + * header_ops variable instead of its pointer to const header_ops + * and assign slave's functions directly. + * For the other case, we set the wrapper functions that pass + * slave_dev to the wrapped functions. + */ + bond->bond_header_ops.create = bond_hard_header; + bond->bond_header_ops.cache_update = bond_header_cache_update; + if (slave_dev->header_ops) { + bond->bond_header_ops.parse = slave_dev->header_ops->parse; + bond->bond_header_ops.cache = slave_dev->header_ops->cache; + bond->bond_header_ops.validate = slave_dev->header_ops->validate; + bond->bond_header_ops.parse_protocol = slave_dev->header_ops->parse_protocol;All these updates probably need WRITE_ONCE(), and corresponding READ_ONCE() on reader sides, at a very minimum ... RCU would even be better later.I believe that locking is not necessary in this patch. The update of `header_ops` only happens when a slave is newly enslaved to a bond. Under such circumstances, members of `header_ops` are not called in parallel with updating. Therefore, there is no possibility of race conditions occurring.bond_dev can certainly be live, and packets can flow. I have seen enough syzbot reports hinting at this precise issue.