Thread (6 messages) 6 messages, 3 authors, 2019-12-02

Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance

From: Paweł Staszewski <hidden>
Date: 2019-12-02 16:24:16

W dniu 02.12.2019 o 11:53, Paolo Abeni pisze:
On Mon, 2019-12-02 at 11:09 +0100, Paweł Staszewski wrote:
quoted
W dniu 01.12.2019 o 17:05, David Ahern pisze:
quoted
On 11/29/19 4:00 PM, Paweł Staszewski wrote:
quoted
As always - each year i need to summarize network performance for
routing applications like linux router on native Linux kernel (without
xdp/dpdk/vpp etc) :)
Do you keep past profiles? How does this profile (and traffic rates)
compare to older kernels - e.g., 5.0 or 4.19?
Yes - so for 4.19:

Max bandwidth was about 40-42Gbit/s RX / 40-42Gbit/s TX of
forwarded(routed) traffic

And after "order-0 pages" patches - max was 50Gbit/s RX + 50Gbit/s TX
(forwarding - bandwidth max)

(current kernel almost doubled this)
Looks like we are on the good track ;)

[...]
quoted
After "order-0 pages" patch

     PerfTop:  104692 irqs/sec  kernel:99.5%  exact:  0.0% [4000Hz
cycles],  (all, 56 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


       9.06%  [kernel]       [k] mlx5e_skb_from_cqe_mpwrq_linear
       6.43%  [kernel]       [k] tasklet_action_common.isra.21
       5.68%  [kernel]       [k] fib_table_lookup
       4.89%  [kernel]       [k] irq_entries_start
       4.53%  [kernel]       [k] mlx5_eq_int
       4.10%  [kernel]       [k] build_skb
       3.39%  [kernel]       [k] mlx5e_poll_tx_cq
       3.38%  [kernel]       [k] mlx5e_sq_xmit
       2.73%  [kernel]       [k] mlx5e_poll_rx_cq
Compared to the current kernel perf figures, it looks like most of the
gains come from driver changes.

[... current perf figures follow ...]
quoted
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


       7.56%  [kernel]       [k] __dev_queue_xmit
This is a bit surprising to me. I guess this is due
'__dev_queue_xmit()' being calling twice per packet (team, NIC) and due
to the retpoline overhead.
quoted
       1.74%  [kernel]       [k] tcp_gro_receive
If the reference use-case is with a quite large number of cuncurrent
flows, I guess you can try disabling GRO
Disabling GRO with teamed interfaces is not good cause after disabling 
GRO on physical interfaces cpu load is about 10% higher on all cores.

And observation:

Enabled GRO on interfaces vs team0 packets per second:

   iface                   Rx                   Tx Total
==============================================================================
             team0:     5952483.50 KB/s      6028436.50 KB/s 11980919.00 
KB/s
----------------------------------------------------------------------------

And softnetstats:

CPU          total/sec     dropped/sec    squeezed/sec 
collision/sec      rx_rps/sec  flow_limit/sec
CPU:00         1014977               0 35               0               
0               0
CPU:01         1074461               0 30               0               
0               0
CPU:02         1020460               0 34               0               
0               0
CPU:03         1077624               0 34               0               
0               0
CPU:04         1005102               0 32               0               
0               0
CPU:05         1097107               0 46               0               
0               0
CPU:06          997877               0 24               0               
0               0
CPU:07         1056216               0 34               0               
0               0
CPU:08          856567               0 34               0               
0               0
CPU:09          862527               0 23               0               
0               0
CPU:10          876107               0 34               0               
0               0
CPU:11          759275               0 27               0               
0               0
CPU:12          817307               0 27               0               
0               0
CPU:13          868073               0 21               0               
0               0
CPU:14          837783               0 34               0               
0               0
CPU:15          817946               0 27               0               
0               0
CPU:16          785500               0 25               0               
0               0
CPU:17          851276               0 28               0               
0               0
CPU:18          843888               0 29               0               
0               0
CPU:19          924840               0 34               0               
0               0
CPU:20          884879               0 37               0               
0               0
CPU:21          841461               0 28               0               
0               0
CPU:22          819436               0 32               0               
0               0
CPU:23          872843               0 32               0               
0               0

Summed:       21863531               0 740               0               
0               0


Disabled GRO on interfaces vs team0 packets per second:

   iface                   Rx                   Tx Total
==============================================================================
             team0:     5952483.50 KB/s      6028436.50 KB/s 11980919.00 
KB/s
----------------------------------------------------------------------------

And softnet stat:

CPU          total/sec     dropped/sec    squeezed/sec 
collision/sec      rx_rps/sec  flow_limit/sec
CPU:00          625288               0 23               0               
0               0
CPU:01          605239               0 24               0               
0               0
CPU:02          644965               0 26               0               
0               0
CPU:03          620264               0 30               0               
0               0
CPU:04          603416               0 25               0               
0               0
CPU:05          597838               0 23               0               
0               0
CPU:06          580028               0 22               0               
0               0
CPU:07          604274               0 23               0               
0               0
CPU:08          556119               0 26               0               
0               0
CPU:09          494997               0 23               0               
0               0
CPU:10          514759               0 23               0               
0               0
CPU:11          500333               0 22               0               
0               0
CPU:12          497956               0 23               0               
0               0
CPU:13          535194               0 14               0               
0               0
CPU:14          504304               0 24               0               
0               0
CPU:15          489015               0 18               0               
0               0
CPU:16          487249               0 24               0               
0               0
CPU:17          472023               0 23               0               
0               0
CPU:18          539454               0 24               0               
0               0
CPU:19          499901               0 19               0               
0               0
CPU:20          479945               0 26               0               
0               0
CPU:21          486800               0 29               0               
0               0
CPU:22          466916               0 26               0               
0               0
CPU:23          559730               0 34               0               
0               0

Summed:       12966008               0 573               0               
0               0

Maybee without team it will be better.
Cheers,

Paolo
-- 
Paweł Staszewski
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help