Thread (10 messages) 10 messages, 4 authors, 2021-05-12

Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC

From: Honnappa Nagarahalli <hidden>
Date: 2021-05-11 14:10:47

<snip>
quoted
quoted
Thanks for your suggestions, we found that the -fno-tree-vectorize
option works.
PS: This option is not successfully added in the earliest test.

Solution:
1. use the -fno-tree-vectorize option to prevent compiler generate
auto vetorization
   code, so tha slow-path will work fine.
2. add '-march=armv8-a+sve+crc' line of implementer_generic in
arm/meson.build
        'part_number_config': {
                'generic': {'machine_args': ['-march=armv8-a+crc',
                                             '-march=armv8-a+sve+crc',
                                             '-moutline-atomics']}
        }
   If compiler doesn't support '-march=armv8-a+sve+crc', then it will
fallback
quoted
quoted
   supports '-march=armv8-a+crc'.
   If compiler supports '-march=armv8-a+sve+crc', then it will
compile SVE- related
   code, so the IO-path could support SVE.

Base above we could achieve initial target.
The 'generic' target is for generating a binary that would work on all ArmV8
machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path
would not work on non-SVE machines.
quoted
The 'generic' only used in local CI (note: the two platforms are both ARMv8
machines)

In the IO-path, we support NEON and SVE Rx/Tx, the code was written by
ACLE, so it will not affect by the -fno-tree-vectorize option.

If compiler supports '-march=armv8-a+sve+crc', then it will compile both
NEON and SVE related code.
Using '-march=armv8-a+sve+crc' and '-fno-tree-vectorize' does not provide an absolute guarantee that the compiler will not use SVE elsewhere.

The safest way to ensure that only specific functions use SVE is to compile without +sve (e.g. using -march=armv8-a) and use pragmas around the functions that are allowed to use SVE.  Ex:

#pragma GCC push_options
#pragma GCC target ("+sve")
void f(int *x) {
	for (int i = 0; i < 100; ++i) x[i] = i;
}
#pragma GCC pop_options
void g(int *x) {
	for (int i = 0; i < 100; ++i) x[i] = i;
}

compiles f() using SVE and g() with standard options.

You can also follow the function multiversioning discussed in the other thread.
In the runtime, driver supports detect the platform whether support SVE, if
not it will select the NEON.

Best regards.
quoted
quoted

On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
quoted
<snip>
quoted
On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
[off-list ref] wrote:
quoted
Hi, ALL
We have a question for your help:
  1. We have two platforms, both of which are ARM64, one of which
supports
quoted
     both NEON and SVE, the other only support NEON.
  2. We want to run on both platforms with a single binary file,
and use
the
quoted
quoted
quoted
     highest vector capability of the corresponding platform
whenever
possible.

I see VPP has a similar feature. IMO, it is not present in DPDK.
Basically, In order to do this.
- Compile slow-path code(90% of DPDK) with minimal CPU instruction
set support
- Have fastpath function compile with different CPU instruction set
levels -In slowpath, Attach the fastpath function pointer-based on
CPU instruction- level support.
Agree.
quoted
quoted
  3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
10.2).
This defines the minimum capabilities of the target machine.
quoted
quoted
     However, it is found that invalid instructions occur when the
program
quoted
quoted
quoted
quoted
quoted
     runs on a machine that does not support SVE (pls see below).
  4. The problem is caused by the introduction of SVE in GCC
automatic
vector
quoted
     optimization.

  So Is there a way to disable GCC automatic vector optimization
or use
only
quoted
quoted
quoted
  NEON to perform automatic vector optimization?
I do not think this is safe. Once SVE is enabled, compiler is
allowed to use
the SVE instructions wherever it finds it fit.
quoted
quoted
quoted
  BTW: we already test -fno-tree-vectorize (as link below) but
found no
effect.
quoted
https://stackoverflow.com/questions/7778174/how-can-i-disable-vect
or
iz
ation-while-using-gcc


The GDB output:
     EAL: Detected 128 lcore(s)
     EAL: Detected 4 NUMA nodes
     Option -w, --pci-whitelist is deprecated, use -a, --allow
option instead

     Program received signal SIGILL, Illegal instruction.
     0x0000000000671b88 in eal_adjust_config ()
     (gdb)
     (gdb) where
     #0  0x0000000000671b88 in eal_adjust_config ()
     #1  0x0000000000682840 in rte_eal_init ()
     #2  0x000000000051c870 in main ()
     (gdb)

The disassembly output of eal_adjust_config:
     671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
     671b80:       f110001f        cmp     x0, #0x400
     671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>
//
quoted
quoted
quoted
quoted
b.any
quoted
     671b88:       043357f5        addvl   x21, x19, #-1
     671b8c:       043457e1        addvl   x1, x20, #-1
     671b90:       910562b5        add     x21, x21, #0x158
     671b94:       04e0e3e0        cntd    x0
     671b98:       914012b5        add     x21, x21, #0x4, lsl #12
     671b9c:       52800218        mov     w24, #0x10                      // #16
     671ba0:       25d8e3e1        ptrue   p1.d
     671ba4:       25f80fe0        whilelo p0.d, wzr, w24
     671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]


Best regards.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help