Thread (50 messages) 50 messages, 7 authors, 2018-03-28

Re: [RFC PATCH 00/24] Introducing AF_XDP support

From: William Tu <hidden>
Date: 2018-03-28 00:07:32

On Tue, Mar 27, 2018 at 2:37 AM, Jesper Dangaard Brouer
[off-list ref] wrote:
On Mon, 26 Mar 2018 14:58:02 -0700
William Tu [off-list ref] wrote:
quoted
quoted
Again high count for NMI ?!?

Maybe you just forgot to tell perf that you want it to decode the
bpf_prog correctly?

https://prototype-kernel.readthedocs.io/en/latest/bpf/troubleshooting.html#perf-tool-symbols

Enable via:
 $ sysctl net/core/bpf_jit_kallsyms=1

And use perf report (while BPF is STILL LOADED):

 $ perf report --kallsyms=/proc/kallsyms

E.g. for emailing this you can use this command:

 $ perf report --sort cpu,comm,dso,symbol --kallsyms=/proc/kallsyms --no-children --stdio -g none | head -n 40
Thanks, I followed the steps, the result of l2fwd
# Total Lost Samples: 119
#
# Samples: 2K of event 'cycles:ppp'
# Event count (approx.): 25675705627
#
# Overhead  CPU  Command  Shared Object       Symbol
# ........  ...  .......  ..................  ..................................
#
    10.48%  013  xdpsock  xdpsock             [.] main
     9.77%  013  xdpsock  [kernel.vmlinux]    [k] clflush_cache_range
     8.45%  013  xdpsock  [kernel.vmlinux]    [k] nmi
     8.07%  013  xdpsock  [kernel.vmlinux]    [k] xsk_sendmsg
     7.81%  013  xdpsock  [kernel.vmlinux]    [k] __domain_mapping
     4.95%  013  xdpsock  [kernel.vmlinux]    [k] ixgbe_xmit_frame_ring
     4.66%  013  xdpsock  [kernel.vmlinux]    [k] skb_store_bits
     4.39%  013  xdpsock  [kernel.vmlinux]    [k] syscall_return_via_sysret
     3.93%  013  xdpsock  [kernel.vmlinux]    [k] pfn_to_dma_pte
     2.62%  013  xdpsock  [kernel.vmlinux]    [k] __intel_map_single
     2.53%  013  xdpsock  [kernel.vmlinux]    [k] __alloc_skb
     2.36%  013  xdpsock  [kernel.vmlinux]    [k] iommu_no_mapping
     2.21%  013  xdpsock  [kernel.vmlinux]    [k] alloc_skb_with_frags
     2.07%  013  xdpsock  [kernel.vmlinux]    [k] skb_set_owner_w
     1.98%  013  xdpsock  [kernel.vmlinux]    [k] __kmalloc_node_track_caller
     1.94%  013  xdpsock  [kernel.vmlinux]    [k] ksize
     1.84%  013  xdpsock  [kernel.vmlinux]    [k] validate_xmit_skb_list
     1.62%  013  xdpsock  [kernel.vmlinux]    [k] kmem_cache_alloc_node
     1.48%  013  xdpsock  [kernel.vmlinux]    [k] __kmalloc_reserve.isra.37
     1.21%  013  xdpsock  xdpsock             [.] xq_enq
     1.08%  013  xdpsock  [kernel.vmlinux]    [k] intel_alloc_iova
You did use net/core/bpf_jit_kallsyms=1 and correct perf commands decoding of
bpf_prog, so the perf top#3 'nmi' is likely a real NMI call... which looks wrong.
Thanks, you're right. Let me dig more on this NMI behavior.
quoted
And l2fwd under "perf stat" looks OK to me. There is little context
switches, cpu is fully utilized, 1.17 insn per cycle seems ok.

Performance counter stats for 'CPU(s) 6':
  10000.787420      cpu-clock (msec)          #    1.000 CPUs utilized
            24      context-switches          #    0.002 K/sec
             0      cpu-migrations            #    0.000 K/sec
             0      page-faults               #    0.000 K/sec
22,361,333,647      cycles                    #    2.236 GHz
13,458,442,838      stalled-cycles-frontend   #   60.19% frontend cycles idle
26,251,003,067      instructions              #    1.17  insn per cycle
                                              #    0.51  stalled cycles per insn
 4,938,921,868      branches                  #  493.853 M/sec
     7,591,739      branch-misses             #    0.15% of all branches
  10.000835769 seconds time elapsed
This perf stat also indicate something is wrong.

The 1.17 insn per cycle is NOT okay, it is too low (compared to what
usually I see, e.g. 2.36  insn per cycle).

It clearly says you have 'stalled-cycles-frontend' and '60.19% frontend
cycles idle'.   This means your CPU have issues/bottleneck fetching
instructions. Explained by Andi Kleen here [1]

[1] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
thanks for the link!
It's definitely weird that my frontend cycle (fetch and decode)
stalled is so high.
I assume this xdpsock code is small and should all fit into the icache.
However, doing another perf stat on xdpsock l2fwd shows

13,720,109,581      stalled-cycles-frontend   # 60.01% frontend cycles
idle     (23.82%)

  <not supported>      stalled-cycles-backend
        7,994,837      branch-misses           # 0.16% of all branches
         (23.80%)
      996,874,424      bus-cycles         # 99.679 M/sec          (23.80%)
   18,942,220,445      ref-cycles      # 1894.067 M/sec          (28.56%)
      100,983,226      LLC-loads         # 10.097 M/sec          (23.80%)
        4,897,089      LLC-load-misses           # 4.85% of all
LL-cache hits     (23.80%)
       66,659,889      LLC-stores          # 6.665 M/sec          (9.52%)
            8,373 LLC-store-misses          # 0.837 K/sec (9.52%)
      158,178,410      LLC-prefetches         # 15.817 M/sec          (9.52%)
        3,011,180      LLC-prefetch-misses       # 0.301 M/sec          (9.52%)
    8,190,383,109      dTLB-loads       # 818.971 M/sec          (9.52%)
       20,432,204      dTLB-load-misses          # 0.25% of all dTLB
cache hits   (9.52%)
    3,729,504,674      dTLB-stores       # 372.920 M/sec          (9.52%)
          992,231  dTLB-store-misses         # 0.099 M/sec          (9.52%)
  <not supported>      dTLB-prefetches
  <not supported>      dTLB-prefetch-misses
           11,619 iTLB-loads                # 0.001 M/sec (9.52%)
        1,874,756      iTLB-load-misses          # 16135.26% of all
iTLB cache hits  (14.28%)

I have super high iTLB-load-misses. This is probably the cause of high
frontend stalled.
Do you know any way to improve iTLB hit rate?

Thanks
William
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help