Thread (49 messages) 49 messages, 9 authors, 2017-11-17

Re: [RFC PATCH 07/14] packet: wire up zerocopy for AF_PACKET V4

From: Björn Töpel <hidden>
Date: 2017-11-03 10:47:26

2017-11-03 4:17 GMT+01:00 Willem de Bruijn [off-list ref]:
On Tue, Oct 31, 2017 at 9:41 PM, Björn Töpel [off-list ref] wrote:
quoted
From: Björn Töpel <redacted>

This commits adds support for zerocopy mode. Note that zerocopy mode
requires that the network interface has been bound to the socket using
the bind syscall, and that the corresponding netdev implements the
AF_PACKET V4 ndos.

Signed-off-by: Björn Töpel <redacted>
---
+
+static void packet_v4_disable_zerocopy(struct net_device *dev,
+                                      struct tp4_netdev_parms *zc)
+{
+       struct tp4_netdev_parms params;
+
+       params = *zc;
+       params.command  = TP4_DISABLE;
+
+       (void)dev->netdev_ops->ndo_tp4_zerocopy(dev, &params);
Don't ignore error return codes.
Will fix!
quoted
+static int packet_v4_zerocopy(struct sock *sk, int qp)
+{
+       struct packet_sock *po = pkt_sk(sk);
+       struct socket *sock = sk->sk_socket;
+       struct tp4_netdev_parms *zc = NULL;
+       struct net_device *dev;
+       bool if_up;
+       int ret = 0;
+
+       /* Currently, only RAW sockets are supported.*/
+       if (sock->type != SOCK_RAW)
+               return -EINVAL;
+
+       rtnl_lock();
+       dev = packet_cached_dev_get(po);
+
+       /* Socket needs to be bound to an interface. */
+       if (!dev) {
+               rtnl_unlock();
+               return -EISCONN;
+       }
+
+       /* The device needs to have both the NDOs implemented. */
+       if (!(dev->netdev_ops->ndo_tp4_zerocopy &&
+             dev->netdev_ops->ndo_tp4_xmit)) {
+               ret = -EOPNOTSUPP;
+               goto out_unlock;
+       }
Inconsistent error handling with above test.
Will fix.
quoted
+
+       if (!(po->rx_ring.pg_vec && po->tx_ring.pg_vec)) {
+               ret = -EOPNOTSUPP;
+               goto out_unlock;
+       }
A ring can be unmapped later with packet_set_ring. Should that operation
fail if zerocopy is enabled? After that, it can also change version with
PACKET_VERSION.
You're correct, I've missed this. I need to revisit the scenario when
a ring is unmapped, and recreated. Thanks for pointing this out.
quoted
+
+       if_up = dev->flags & IFF_UP;
+       zc = rtnl_dereference(po->zc);
+
+       /* Disable */
+       if (qp <= 0) {
+               if (!zc)
+                       goto out_unlock;
+
+               packet_v4_disable_zerocopy(dev, zc);
+               rcu_assign_pointer(po->zc, NULL);
+
+               if (if_up) {
+                       spin_lock(&po->bind_lock);
+                       register_prot_hook(sk);
+                       spin_unlock(&po->bind_lock);
+               }
There have been a bunch of race conditions in this bind code. We need
to be very careful with adding more states to the locking, especially when
open coding in multiple locations, as this patch does. I counted at least
four bind locations. See for instance also
http://patchwork.ozlabs.org/patch/813945/
Yeah, the locking schemes in AF_PACKET is pretty convoluted. I'll
document and make the locking more explicit (and avoiding open coding
it).
quoted
+
+               goto out_unlock;
+       }
+
+       /* Enable */
+       if (!zc) {
+               zc = kzalloc(sizeof(*zc), GFP_KERNEL);
+               if (!zc) {
+                       ret = -ENOMEM;
+                       goto out_unlock;
+               }
+       }
+
+       if (zc->queue_pair >= 0)
+               packet_v4_disable_zerocopy(dev, zc);
This calls disable even if zc was freshly allocated.
Shoud be > 0?
Good catch. It should be > 0.
quoted
 static int packet_release(struct socket *sock)
 {
+       struct tp4_netdev_parms *zc;
        struct sock *sk = sock->sk;
+       struct net_device *dev;
        struct packet_sock *po;
        struct packet_fanout *f;
        struct net *net;
@@ -3337,6 +3541,20 @@ static int packet_release(struct socket *sock)
        sock_prot_inuse_add(net, sk->sk_prot, -1);
        preempt_enable();

+       rtnl_lock();
+       zc = rtnl_dereference(po->zc);
+       dev = packet_cached_dev_get(po);
+       if (zc && dev)
+               packet_v4_disable_zerocopy(dev, zc);
+       if (dev)
+               dev_put(dev);
+       rtnl_unlock();
+
+       if (zc) {
+               synchronize_rcu();
+               kfree(zc);
+       }
Please use a helper function for anything this complex.
Will fix.


Thanks,
Björn
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help