Re: Optimizing instruction-cache, more packets at each stage

Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Hannes Frederic Sowa <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-15
RE: Optimizing instruction-cache, more packets at each stage · David Laight <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Felix Fietkau <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Florian Fainelli <f.fainelli@gmail.com> · 2016-01-25
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Or Gerlitz <hidden> · 2016-01-20
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-20
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-20
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Or Gerlitz <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Or Gerlitz <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · "Michael S. Tsirkin" <mst@redhat.com> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · John Fastabend <john.fastabend@gmail.com> · 2016-01-24
Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · John Fastabend <john.fastabend@gmail.com> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · John Fastabend <john.fastabend@gmail.com> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-27
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Alexei Starovoitov <hidden> · 2016-01-27
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Eric Dumazet <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Eric Dumazet <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Eric Dumazet <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Eric Dumazet <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-28
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · John Fastabend <john.fastabend@gmail.com> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-22
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-22
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-22
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-22
Re: Optimizing instruction-cache, more packets at each stage · Or Gerlitz <hidden> · 2016-02-02
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-02-02
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-18

From: Florian Fainelli <f.fainelli@gmail.com>
Date: 2016-01-25 00:08:53

Hi Jesper

On 18/01/2016 03:54, Jesper Dangaard Brouer wrote:

On Fri, 15 Jan 2016 15:38:43 +0100 Felix Fietkau [off-list ref] wrote:

quoted

On 2016-01-15 15:00, Jesper Dangaard Brouer wrote:

[...]

quoted

The icache is still quite small 32Kb on modern server processors.  I
don't know if smaller embedded processors also have icache and how
large they are.  I speculate this approach would also be a benefit for
them (if they have icache).

All of the router devices that I work with have icache. Typical sizes
are 32 or 64 KiB. FWIW, I'm really looking forward to having such
optimizations in the network stack ;)

That is very interesting. These kind of icache optimization will then
likely benefit lower-end devices more than high end Intel CPUs :-)

Typical embedded routers have small I and D cache, but they also have
fairly small cache line sizes (16, 32 or 64 bytes), and not necessarily
a L2 cache to help them, the memory bandwidth is also very limited
(DDR/DDR2 speeds are not uncommon) so the less I/D cache lines you
trash, the better obviously.

One thing that some HW vendors have done, before they started
introducing a HW capable of offloading routing/NAT workloads to
specialized hardware is to hack the heck of the Linux network stack to
allow a lightweight SKB structure to be used for forwarding and allocate
these "meta" bookeekping SKBs from a dedicated kmem cache pool to get
relatively predictable latencies.

There is also a notion of a dirty pointer within the skbuff itself, such
that instead of e.g: having your Ethernet NIC driver do a DMA-API call
which can potentially invalidate the D-cache for an entire 1500-ish
bytes Ethernet frame, the packet contents are "valid" up until the dirty
pointer, which is a nice trick if you are just forwarding, but requires
both SKB accessors/manipulation functions to check that, and your
Ethernet driver to be cooperative as well, so may not scale well.

Broadcom's implementation of such a thing can be found here among these
files, code is not kernel style compliant, but there might be some
re-usable ideas for you:

NBUFF/FKBUFF/SKBUFF are the actual packet book keeping data structures
that replace and/or extend the use of SKBs:

https://code.google.com/p/gfiber-gflt100/source/browse/kernel/linux/include/linux/nbuff.h
https://code.google.com/p/gfiber-gflt100/source/browse/kernel/linux/net/core/nbuff.c

# Check for CONFIG_MIPS_BRCM changes here:
https://code.google.com/p/gfiber-gflt100/source/browse/kernel/linux/net/core/skbuff.c
https://code.google.com/p/gfiber-gflt100/source/browse/kernel/linux/include/linux/skbuff.h

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help