Re: [PATCH net-next 3/3] udp: try to avoid 2 cache miss on dequeue
From: Paolo Abeni <pabeni@redhat.com>
Date: 2017-06-01 10:46:30
On Wed, 2017-05-31 at 10:04 -0700, Eric Dumazet wrote:
On Mon, 2017-05-29 at 17:27 +0200, Paolo Abeni wrote:quoted
when udp_recvmsg() is executed, on x86_64 and other archs, most skb fields are on cold cachelines. If the skb are linear and the kernel don't need to compute the udp csum, only a handful of skb fields are required by udp_recvmsg(). Since we already use skb->dev_scratch to cache hot data, and there are 32 bits unused on 64 bit archs, use such field to cache as much data as we can, and try to prefetch on dequeue the relevant fields that are left out. This can save up to 2 cache miss per packet.okay ;)quoted
Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 103 insertions(+), 11 deletions(-)diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 53fa48d..616132e 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c@@ -1163,6 +1163,83 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset, return ret; } +/* Copy as much information as possible into skb->dev_scratch to avoid + * possibly multiple cache miss on dequeue(); + */ +#if BITS_PER_LONG == 64 + +/* we can store multiple info here: truesize, len and the bit needed to + * compute skb_csum_unnecessary will be on cold cache lines at recvmsg + * time. + * skb->len can be stored on 16 bits since the udp header has been already + * validated and pulled. + */ +struct udp_dev_scratch { + __u32 truesize; + __u16 len; + __u16 is_linear:1; + __u16 csum_unnecessary:1;What about u32 truesize; u16 len; bool is_linear; bool csum_unnecessary; I do not believe the __ prefix is necessary for a local structure (not uapi) Also a plain bool or u8 is faster than a bit field (shorter instructions)
Thank you! I like the above! I'll go for 'bool' usage in v2, Paolo p.s. I used the bitfield because I initially had an additional, very ugly, patch saving another cache miss and requiring one more bit there, but said patch hurted so much the sight that I had to drop it.