Thread (5 messages) 5 messages, 3 authors, 2021-11-12

Re: [fs] a0918006f9: netperf.Throughput_tps -11.6% regression

From: Mickaël Salaün <mic@digikod.net>
Date: 2021-11-10 08:52:43
Also in: linux-api, linux-fsdevel, linux-security-module, lkml, oe-lkp

On 09/11/2021 18:21, Kees Cook wrote:
On Fri, Nov 05, 2021 at 02:41:59PM +0800, kernel test robot wrote:
quoted

Greeting,

FYI, we noticed a -11.6% regression of netperf.Throughput_tps due to commit:


commit: a0918006f9284b77397ae4f163f055c3e0f987b2 ("[PATCH v15 1/3] fs: Add trusted_for(2) syscall implementation and related sysctl")
url: https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Add-trusted_for-2-was-O_MAYEXEC/20211013-032533
patch link: https://lore.kernel.org/kernel-hardening/20211012192410.2356090-2-mic@digikod.net (local)

in testcase: netperf
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 16
	cluster: cs-localhost
	test: TCP_CRR
	cpufreq_governor: performance
	ucode: 0x5003006

test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/


please be noted we made out some further analysis/tests, as Fengwei mentioned:
==============================================================================
Here is my investigation result of this regression:

If I add patch to make sure the kernel function address and data address is
almost same even with this patch, there is almost no performance delta(0.1%)
w/o the patch.

And if I only make sure function address same w/o the patch, the performance
delta is about 5.1%.

So suppose this regression is triggered by different function and data address.
We don't know why the different address could bring such kind of regression yet
===============================================================================


we also tested on other platforms.
on a Cooper Lake (Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory),
we also observed regression but the gap is smaller:
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase/ucode:
  cs-localhost/gcc-9/performance/ipv4/x86_64-rhel-8.3/16/debian-10.4-x86_64-20200603.cgz/300s/lkp-cpl-4sp1/TCP_CRR/netperf/0x700001e

commit:
  v5.15-rc4
  a0918006f9284b77397ae4f163f055c3e0f987b2

       v5.15-rc4 a0918006f9284b77397ae4f163f
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
    333492            -5.7%     314346 ±  2%  netperf.Throughput_total_tps
     20843            -4.5%      19896        netperf.Throughput_tps


but no regression on a 96 threads 2 sockets Ice Lake with 256G memory:
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase/ucode:
  cs-localhost/gcc-9/performance/ipv4/x86_64-rhel-8.3/16/debian-10.4-x86_64-20200603.cgz/300s/lkp-icl-2sp1/TCP_CRR/netperf/0xb000280

commit:
  v5.15-rc4
  a0918006f9284b77397ae4f163f055c3e0f987b2

       v5.15-rc4 a0918006f9284b77397ae4f163f
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
    555600            -0.1%     555305        netperf.Throughput_total_tps
     34725            -0.1%      34706        netperf.Throughput_tps


Fengwei also helped review these results and commented:
I suppose these three CPUs have different cache policy. It also could be
related with netperf throughput testing.
Does moving the syscall implementation somewhere else change things?
That's a _huge_ performance change for something that isn't even called.
What's going on here?
This regression doesn't make sense. I guess this is the result of a
flaky netperf test, maybe because the test machine was overloaded at
that time.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help