Thread (28 messages) 28 messages, 6 authors, 2020-03-23

Re: [PATCH] net: Make skb_segment not to compute checksum if network controller supports checksumming

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2020-02-28 14:31:38

On Fri, Feb 28, 2020 at 12:25 AM Yadu Kishore [off-list ref] wrote:
quoted
Did you measure a cycle efficiency improvement? As discussed in the
referred email thread, the kernel uses checksum_and_copy because it is
generally not significantly more expensive than copy alone
skb_segment already is a very complex function. New code needs to
offer a tangible benefit.
I ran iperf TCP Tx traffic of 1000 megabytes and captured the cpu cycle
utilization using perf:
"perf record -e cycles -a iperf \
-c 192.168.2.53 -p 5002 -fm -n 1048576000 -i 2  -l 8k -w 8m"

I see the following are the top consumers of cpu cycles:

Function                                   %cpu cycles
=======                                   =========
skb_mac_gso_segment            0.02
inet_gso_segment                     0.26
tcp4_gso_segment                    0.02
tcp_gso_segment                      0.19
skb_segment                             0.52
skb_copy_and_csum_bits         0.64
do_csum                                    7.25
memcpy                                     3.71
__alloc_skb                                0.91
==========                              ====
SUM                                           13.52

The measurement was done on an arm64 hikey960 platform running android with
linux kernel ver 4.19.23.
I see that 7.25% of the cpu cycles is spent computing the checksum against the
total of 13.52% of cpu cycles.
Which means around 52.9% of the total cycles is spent doing checksum.
Hence the attempt to try to offload checksum in the case of GSO also.
Can you contrast this against a run with your changes? The thought is
that the majority of this cost is due to the memory loads and stores, not
the arithmetic ops to compute the checksum. When enabling checksum
offload, the same stalls will occur, but will simply be attributed to
memcpy instead of to do_csum. A:B comparisons of absolute (-n) cycle
counts are usually very noisy, but it's worth a shot.

quoted
Is this not already handled by __copy_skb_header above? If ip_summed
has to be initialized, so have csum_start and csum_offset. That call
should have initialized all three.
Thanks, I will look into why even though __copy_skb_header is being
called, I am still
seeing skb->ip_summed set to CHECKSUM_NONE in the network driver.
Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help