Thread (59 messages) 59 messages, 12 authors, 2016-02-02

Re: Optimizing instruction-cache, more packets at each stage

From: Florian Fainelli <f.fainelli@gmail.com>
Date: 2016-01-25 00:08:53

Hi Jesper

On 18/01/2016 03:54, Jesper Dangaard Brouer wrote:
On Fri, 15 Jan 2016 15:38:43 +0100 Felix Fietkau [off-list ref] wrote:
quoted
On 2016-01-15 15:00, Jesper Dangaard Brouer wrote:
[...]
quoted
quoted
The icache is still quite small 32Kb on modern server processors.  I
don't know if smaller embedded processors also have icache and how
large they are.  I speculate this approach would also be a benefit for
them (if they have icache).
All of the router devices that I work with have icache. Typical sizes
are 32 or 64 KiB. FWIW, I'm really looking forward to having such
optimizations in the network stack ;)
That is very interesting. These kind of icache optimization will then
likely benefit lower-end devices more than high end Intel CPUs :-)
Typical embedded routers have small I and D cache, but they also have
fairly small cache line sizes (16, 32 or 64 bytes), and not necessarily
a L2 cache to help them, the memory bandwidth is also very limited
(DDR/DDR2 speeds are not uncommon) so the less I/D cache lines you
trash, the better obviously.

One thing that some HW vendors have done, before they started
introducing a HW capable of offloading routing/NAT workloads to
specialized hardware is to hack the heck of the Linux network stack to
allow a lightweight SKB structure to be used for forwarding and allocate
these "meta" bookeekping SKBs from a dedicated kmem cache pool to get
relatively predictable latencies.

There is also a notion of a dirty pointer within the skbuff itself, such
that instead of e.g: having your Ethernet NIC driver do a DMA-API call
which can potentially invalidate the D-cache for an entire 1500-ish
bytes Ethernet frame, the packet contents are "valid" up until the dirty
pointer, which is a nice trick if you are just forwarding, but requires
both SKB accessors/manipulation functions to check that, and your
Ethernet driver to be cooperative as well, so may not scale well.

Broadcom's implementation of such a thing can be found here among these
files, code is not kernel style compliant, but there might be some
re-usable ideas for you:

NBUFF/FKBUFF/SKBUFF are the actual packet book keeping data structures
that replace and/or extend the use of SKBs:

https://code.google.com/p/gfiber-gflt100/source/browse/kernel/linux/include/linux/nbuff.h
https://code.google.com/p/gfiber-gflt100/source/browse/kernel/linux/net/core/nbuff.c

# Check for CONFIG_MIPS_BRCM changes here:
https://code.google.com/p/gfiber-gflt100/source/browse/kernel/linux/net/core/skbuff.c
https://code.google.com/p/gfiber-gflt100/source/browse/kernel/linux/include/linux/skbuff.h
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help