Thread (6 messages) 6 messages, 2 authors, 2013-07-31

Re: [PATCH net-next 2/2] openvswitch: Use skb_zerocopy() to prepare skb for upcall

From: Jesse Gross <hidden>
Date: 2013-07-26 01:40:03

On Thu, Jul 25, 2013 at 5:43 AM, Thomas Graf [off-list ref] wrote:
From: Thomas Graf <redacted>

Use of skb_zerocopy() avoids the expensive call to memcpy() when
copying the packet data into the Netlink skb. Completes checksum
through skb_checksum_help() if needed (typicall packet input from
software device) which invalidates some of the gains again.

Stock-RX
-  38.30%       swapper  [kernel.kallsyms]  [k] memcpy
   - memcpy
      + 87.46% queue_userspace_packet
      + 12.54% nla_put
+  24.72%  ovs-vswitchd  libc-2.17.so       [.] __memcpy_ssse3_back
+  13.80%  ovs-vswitchd  [kernel.kallsyms]  [k] memcpy
-   7.68%   ksoftirqd/2  [kernel.kallsyms]  [k] memcpy
   - memcpy
      + 85.83% queue_userspace_packet
      + 14.17% nla_put
-   7.06%   ksoftirqd/3  [kernel.kallsyms]  [k] memcpy
   - memcpy
      + 84.85% queue_userspace_packet
      + 15.15% nla_put
-   4.41%   ksoftirqd/0  [kernel.kallsyms]  [k] memcpy
   - memcpy
      + 83.48% queue_userspace_packet
      + 16.52% nla_put

Zerocopy-RX
+  50.35%  ovs-vswitchd  libc-2.17.so       [.] __memcpy_ssse3_back
-  27.78%  ovs-vswitchd  [kernel.kallsyms]  [k] memcpy
   - memcpy
      + 74.53% ovs_packet_cmd_execute
      + 24.11% nla_put
      + 0.93% ovs_flow_cmd_new_or_set
+  13.49%       swapper  [kernel.kallsyms]  [k] memcpy
+   1.45%   ksoftirqd/3  [kernel.kallsyms]  [k] memcpy
+   1.20%   ksoftirqd/2  [kernel.kallsyms]  [k] memcpy

10Gb remote pktgen, 1200 bytes, randomized flows, w/ UDPCSUM:
                                Hits               Missed          Lost
Stock RX                     731'945            6'315'739     3'606'678
Zerocopy RX                  764'041            6'442'761     3'947'451

local pktgen, 4/6 CPUs, 1200 bytes, randomized flows, UDPCSUM:
                                Hits               Missed          Lost
Stock TX                   2'071'030           17'929'192    16'807'785
Zerocopy TX                1'951'142           18'049'056    16'977'296

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Jesse Gross <redacted>
Cc: Eric Dumazet <redacted>
Thanks for the new version and performance numbers.

Reading the numbers that you provided it seems like this is a win for
received packets and basically a wash for outgoing packets (assuming
that they are using checksum offloading, which I suspect is most of
them). Is that also your conclusion?

A couple of comments:
quoted hunk ↗ jump to hunk
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index f7e3a0d..c5927ad 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -383,10 +383,11 @@ static size_t key_attr_size(void)
 }

 static size_t upcall_msg_size(const struct sk_buff *skb,
-                             const struct nlattr *userdata)
+                             const struct nlattr *userdata,
+                             unsigned int hdrlen)
I think the skb parameter is no longer used by this function.
quoted hunk ↗ jump to hunk
@@ -443,11 +450,39 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
                          nla_len(upcall_info->userdata),
                          nla_data(upcall_info->userdata));

-       nla = __nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, skb->len);
+       if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0)))
+               goto out;
Do we expect that this might fail now?
+       nla->nla_len = nla_attr_size(skb->len);
+
+       skb_zerocopy(user_skb, skb, skb->len, hlen);
+
+       /* Align the end of the attribute to NLA_ALIGNTO */
+       plen = NLA_ALIGN(user_skb->len) - user_skb->len;
+       if (plen > 0) {
+               int nr_frags = skb_shinfo(user_skb)->nr_frags;

-       skb_copy_and_csum_dev(skb, nla_data(nla));
+               if (nr_frags) {
+                       skb_frag_t *frag;
+
+                       /* Assumption is made that PAGE_SIZE is always alligned
+                        * to at least NLA_ALIGNTO (4) which means that we it
+                        * should be safe to add the padding bytes to the frag
+                        */
I agree that it should be safe to assume that PAGE_SIZE is a multiple
of the netlink alignment requirements. However, we are calculating the
alignment over the total packet payload but applying the alignment to
the paged portion. Couldn't we have a non-aligned potion in the linear
data area followed by a full page?
+       /* Fix alignment of .nlmsg_len, OVS user space enforces a strict
+        * total message size alignment.
+        */
+       ((struct nlmsghdr *) user_skb->data)->nlmsg_len = NLA_ALIGN(user_skb->len);
Do we still need to do this manually now that we are enforcing
alignment of the payload above?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help