Re: [PATCH v2] pgo: add clang's Profile Guided Optimization infrastructure
From: Fāng-ruì Sòng <hidden>
Date: 2021-01-12 17:46:22
Also in:
linux-doc, lkml
On Tue, Jan 12, 2021 at 9:37 AM 'Nick Desaulniers' via Clang Built Linux [off-list ref] wrote:
On Mon, Jan 11, 2021 at 9:14 PM Bill Wendling [off-list ref] wrote:quoted
From: Sami Tolvanen <samitolvanen@google.com> Enable the use of clang's Profile-Guided Optimization[1]. To generate a profile, the kernel is instrumented with PGO counters, a representative workload is run, and the raw profile data is collected from /sys/kernel/debug/pgo/profraw. The raw profile data must be processed by clang's "llvm-profdata" tool before it can be used during recompilation: $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw Multiple raw profiles may be merged during this step. The data can now be used by the compiler: $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ... This initial submission is restricted to x86, as that's the platform wePlease drop all changes to arch/* that are not to arch/x86/ then; we can cross that bridge when we get to each arch. For example, there's no point disabling PGO for architectures LLVM doesn't even have a backend for.quoted
know works. This restriction can be lifted once other platforms have been verified to work with PGO. Note that this method of profiling the kernel is clang-native and isn't compatible with clang's gcov support in kernel/gcov.Then the Kconfig option should depend on !GCOV so that they are mutually exclusive and can't be selected together accidentally; such as by bots doing randconfig tests.
The profile formats (Clang PGO, Clang gcov, GCC gcov/PGO) are different but Clang PGO can be used with Clang's gcov implementation: clang -fprofile-generate --coverage a.cc; ./a.out => default*.profraw + a.gcda
<large snip>quoted
+static inline int inst_prof_popcount(unsigned long long value) +{ + value = value - ((value >> 1) & 0x5555555555555555ULL); + value = (value & 0x3333333333333333ULL) + + ((value >> 2) & 0x3333333333333333ULL); + value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL; + + return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56); +}The kernel has a portable popcnt implementation called hweight64 if you #include <asm-generic/bitops/hweight.h>; does that work here? https://en.wikipedia.org/wiki/Hamming_weight -- Thanks, ~Nick Desaulniers -- You received this message because you are subscribed to the Google Groups "Clang Built Linux" group. To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CAKwvOdk%2BNqhzC_4wFbQMJmLMQWoDSjQiRJyCGe5dsWkqK_NJJQ%40mail.gmail.com.