Thread (17 messages) 17 messages, 5 authors, 2017-06-07

Re: [PATCH net-next 3/3] udp: try to avoid 2 cache miss on dequeue

From: Paolo Abeni <pabeni@redhat.com>
Date: 2017-06-01 10:46:30

On Wed, 2017-05-31 at 10:04 -0700, Eric Dumazet wrote:
On Mon, 2017-05-29 at 17:27 +0200, Paolo Abeni wrote:
quoted
when udp_recvmsg() is executed, on x86_64 and other archs, most skb
fields are on cold cachelines.
If the skb are linear and the kernel don't need to compute the udp
csum, only a handful of skb fields are required by udp_recvmsg().
Since we already use skb->dev_scratch to cache hot data, and
there are 32 bits unused on 64 bit archs, use such field to cache
as much data as we can, and try to prefetch on dequeue the relevant
fields that are left out.

This can save up to 2 cache miss per packet.
okay ;)
quoted
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 103 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 53fa48d..616132e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1163,6 +1163,83 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
 	return ret;
 }
 
+/* Copy as much information as possible into skb->dev_scratch to avoid
+ * possibly multiple cache miss on dequeue();
+ */
+#if BITS_PER_LONG == 64
+
+/* we can store multiple info here: truesize, len and the bit needed to
+ * compute skb_csum_unnecessary will be on cold cache lines at recvmsg
+ * time.
+ * skb->len can be stored on 16 bits since the udp header has been already
+ * validated and pulled.
+ */
+struct udp_dev_scratch {
+	__u32 truesize;
+	__u16 len;
+	__u16 is_linear:1;
+	__u16 csum_unnecessary:1;
What about 
	u32   truesize;
	u16   len;
	bool  is_linear;
	bool  csum_unnecessary;

I do not believe the __ prefix is necessary for a local structure (not
uapi)

Also a plain bool or u8 is faster than a bit field (shorter
instructions)
Thank you! I like the above! I'll go for 'bool' usage in v2,

Paolo

p.s. I used the bitfield because I initially had an additional, very
ugly, patch saving another cache miss and requiring one more bit there,
 but said patch hurted so much the sight that I had to drop it.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help