Re: Optimizing instruction-cache, more packets at each stage

Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Hannes Frederic Sowa <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-15
RE: Optimizing instruction-cache, more packets at each stage · David Laight <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Felix Fietkau <hidden> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Florian Fainelli <f.fainelli@gmail.com> · 2016-01-25
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-15
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Or Gerlitz <hidden> · 2016-01-20
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-20
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-20
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Or Gerlitz <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Or Gerlitz <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · David Miller <davem@davemloft.net> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · "Michael S. Tsirkin" <mst@redhat.com> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · John Fastabend <john.fastabend@gmail.com> · 2016-01-24
Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · John Fastabend <john.fastabend@gmail.com> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · John Fastabend <john.fastabend@gmail.com> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-25
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-27
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Alexei Starovoitov <hidden> · 2016-01-27
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Eric Dumazet <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Eric Dumazet <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Eric Dumazet <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Tom Herbert <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Eric Dumazet <hidden> · 2016-01-28
Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) · Jesper Dangaard Brouer <hidden> · 2016-01-28
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · John Fastabend <john.fastabend@gmail.com> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-24
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-21
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-22
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-22
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-22
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-22
Re: Optimizing instruction-cache, more packets at each stage · Or Gerlitz <hidden> · 2016-02-02
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-02-02
Re: Optimizing instruction-cache, more packets at each stage · Eric Dumazet <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Tom Herbert <hidden> · 2016-01-18
Re: Optimizing instruction-cache, more packets at each stage · Jesper Dangaard Brouer <hidden> · 2016-01-18

From: Eric Dumazet <hidden>
Date: 2016-01-18 17:01:52

On Mon, 2016-01-18 at 12:54 +0100, Jesper Dangaard Brouer wrote:

That is very interesting. These kind of icache optimization will then
likely benefit lower-end devices more than high end Intel CPUs :-)

AFAIK the Intel CPUs are masking this icache problem, by having a icache
prefetcher and optimizing how fast the CPU can load/refill from higher
level caches.  Intel CPUs have a lot of HW-logic around this, which the
I assume the smaller CPUs don't.  E.g. quote from Intel Optimization
Reference Manual:

 "The instruction fetch unit (IFU) can fetch up to 16 bytes of aligned
  instruction bytes each cycle from the instruction cache to the
  instruction length decoder (ILD). The instruction queue (IQ) buffers
  the ILD-processed instructions and can deliver up to four instructions
  in one cycle to the instruction decoder."

This does not tell how many core/threads can fetch 16 bytes per cycle.

With more than 36 execution units per socket, single peak performance of
one unit does not reflect what happens when all units are busy and
contend on shared resource.

If we want to properly exploit L1 caches of each execution unit, we need
to split the load in a pipeline. But the number of units depend on
hardware capabilities (like L1 cache size). Something hard to code in a
generic way (linux kernel)

For example, having the same core handling RX and TX interrupts are not
the best choice, especially when TX interrupts have to call expensive
callbacks to upper layers (TCP Small Queues).

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help