Thread (3 messages) 3 messages, 2 authors, 2021-12-29

Re: [PATCH] cache: Workaround HiSilicon Taishan DC CVAU

From: chenweilong <hidden>
Date: 2021-12-29 03:11:54
Also in: lkml

On 2021/12/14 2:56, Will Deacon wrote:
On Fri, Nov 26, 2021 at 05:11:39PM +0800, Weilong Chen wrote:
quoted
Taishan's L1/L2 cache is inclusive, and the data is consistent.
Any change of L1 does not require DC operation to brush CL in L1 to L2.
It's safe that don't clean data cache by address to point of unification.

Without IDC featrue, kernel needs to flush icache as well as dcache,
causes performance degradation.

The flaw refers to V110/V200 variant 1.

Signed-off-by: Weilong Chen <redacted>
---
 Documentation/arm64/silicon-errata.rst |  2 ++
 arch/arm64/Kconfig                     | 11 +++++++++
 arch/arm64/include/asm/cputype.h       |  2 ++
 arch/arm64/kernel/cpu_errata.c         | 32 ++++++++++++++++++++++++++
 arch/arm64/tools/cpucaps               |  1 +
 5 files changed, 48 insertions(+)
Hmm. We don't usually apply optimisations for specific CPUs on arm64, simply
because the diversity of CPUs out there means it quickly becomes a
fragmented mess.

Is this patch purely a performance improvement? If so, please can you
provide some numbers in an attempt to justify it?
Yes,it's a performance improvement. I have a test program like this:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/time.h>

int main()
{
        void *tmp;
        int len = 200 * 1024 * 1024;
        struct timeval start, end;
        int interval;
        tmp = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        if(tmp == MAP_FAILED) {
                perror("mmap failed");
                exit(errno);
        }
        memset(tmp, 0, len);

        gettimeofday(&start, NULL);
        if(mprotect(tmp, len, PROT_READ|PROT_EXEC)) {
                perror("Couldn’t mprotect");
                exit(errno);
        }
        gettimeofday(&end, NULL);
        interval = 1000000*(end.tv_sec - start.tv_sec) + (end.tv_usec - start.tv_usec);
        printf("interval = %fms\n", interval/1000.0);
}

Without this fix, the mprotect takes:

interval = 25.608000ms

And with this fix:

interval = 0.689000ms

Have better performance improvement.

If you think it is suitable, I will send a v2 patch as the original patch broken cpu hotplug checks.
Thanks,

Will
.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help