Re: [RFC PATCH 07/14] packet: wire up zerocopy for AF_PACKET V4
From: Björn Töpel <hidden>
Date: 2017-11-03 10:47:26
2017-11-03 4:17 GMT+01:00 Willem de Bruijn [off-list ref]:
On Tue, Oct 31, 2017 at 9:41 PM, Björn Töpel [off-list ref] wrote:quoted
From: Björn Töpel <redacted> This commits adds support for zerocopy mode. Note that zerocopy mode requires that the network interface has been bound to the socket using the bind syscall, and that the corresponding netdev implements the AF_PACKET V4 ndos. Signed-off-by: Björn Töpel <redacted> --- + +static void packet_v4_disable_zerocopy(struct net_device *dev, + struct tp4_netdev_parms *zc) +{ + struct tp4_netdev_parms params; + + params = *zc; + params.command = TP4_DISABLE; + + (void)dev->netdev_ops->ndo_tp4_zerocopy(dev, ¶ms);Don't ignore error return codes.
Will fix!
quoted
+static int packet_v4_zerocopy(struct sock *sk, int qp) +{ + struct packet_sock *po = pkt_sk(sk); + struct socket *sock = sk->sk_socket; + struct tp4_netdev_parms *zc = NULL; + struct net_device *dev; + bool if_up; + int ret = 0; + + /* Currently, only RAW sockets are supported.*/ + if (sock->type != SOCK_RAW) + return -EINVAL; + + rtnl_lock(); + dev = packet_cached_dev_get(po); + + /* Socket needs to be bound to an interface. */ + if (!dev) { + rtnl_unlock(); + return -EISCONN; + } + + /* The device needs to have both the NDOs implemented. */ + if (!(dev->netdev_ops->ndo_tp4_zerocopy && + dev->netdev_ops->ndo_tp4_xmit)) { + ret = -EOPNOTSUPP; + goto out_unlock; + }Inconsistent error handling with above test.
Will fix.
quoted
+ + if (!(po->rx_ring.pg_vec && po->tx_ring.pg_vec)) { + ret = -EOPNOTSUPP; + goto out_unlock; + }A ring can be unmapped later with packet_set_ring. Should that operation fail if zerocopy is enabled? After that, it can also change version with PACKET_VERSION.
You're correct, I've missed this. I need to revisit the scenario when a ring is unmapped, and recreated. Thanks for pointing this out.
quoted
+ + if_up = dev->flags & IFF_UP; + zc = rtnl_dereference(po->zc); + + /* Disable */ + if (qp <= 0) { + if (!zc) + goto out_unlock; + + packet_v4_disable_zerocopy(dev, zc); + rcu_assign_pointer(po->zc, NULL); + + if (if_up) { + spin_lock(&po->bind_lock); + register_prot_hook(sk); + spin_unlock(&po->bind_lock); + }There have been a bunch of race conditions in this bind code. We need to be very careful with adding more states to the locking, especially when open coding in multiple locations, as this patch does. I counted at least four bind locations. See for instance also http://patchwork.ozlabs.org/patch/813945/
Yeah, the locking schemes in AF_PACKET is pretty convoluted. I'll document and make the locking more explicit (and avoiding open coding it).
quoted
+ + goto out_unlock; + } + + /* Enable */ + if (!zc) { + zc = kzalloc(sizeof(*zc), GFP_KERNEL); + if (!zc) { + ret = -ENOMEM; + goto out_unlock; + } + } + + if (zc->queue_pair >= 0) + packet_v4_disable_zerocopy(dev, zc);This calls disable even if zc was freshly allocated. Shoud be > 0?
Good catch. It should be > 0.
quoted
static int packet_release(struct socket *sock) { + struct tp4_netdev_parms *zc; struct sock *sk = sock->sk; + struct net_device *dev; struct packet_sock *po; struct packet_fanout *f; struct net *net;@@ -3337,6 +3541,20 @@ static int packet_release(struct socket *sock) sock_prot_inuse_add(net, sk->sk_prot, -1); preempt_enable(); + rtnl_lock(); + zc = rtnl_dereference(po->zc); + dev = packet_cached_dev_get(po); + if (zc && dev) + packet_v4_disable_zerocopy(dev, zc); + if (dev) + dev_put(dev); + rtnl_unlock(); + + if (zc) { + synchronize_rcu(); + kfree(zc); + }Please use a helper function for anything this complex.
Will fix. Thanks, Björn