Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure
From: Sedat Dilek <hidden>
Date: 2021-01-18 01:02:58
Also in:
linux-kbuild, lkml
On Sat, Jan 16, 2021 at 1:13 AM Nick Desaulniers [off-list ref] wrote:
quoted
On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers [off-list ref] wrote:quoted
On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor [off-list ref] wrote:quoted
However, I see an issue with actually using the data: $ sudo -s # mount -t debugfs none /sys/kernel/debug # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw # chown nathan:nathan vmlinux.profraw # exit $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic) error: No profiles could be merged. Am I holding it wrong? :) Note, this is virtualized, I do not have any "real" x86 hardware that I can afford to test on right now.Same. I think the magic calculation in this patch may differ from upstream llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101Err...it looks like it was the padding calculation. With that fixed up, we can query the profile data to get insights on the most heavily called functions. Here's what my top 20 are (reset, then watch 10 minutes worth of cat videos on youtube while running `find /` and sleeping at my desk). Anything curious stand out to anyone?Hello world from my personal laptop whose kernel was rebuilt with profiling data! Wow, I can run `find /` and watch cat videos on youtube so fast! Users will love this! /s Checking the sections sizes of .text.hot. and .text.unlikely. looks good!
On each rebuild I need to pass to make ...? LLVM=1 -fprofile-use=vmlinux.profdata Did you try together with passing LLVM_IAS=1 to make? - Sedat -
quoted
$ llvm-profdata show -topn=20 /tmp/vmlinux.profraw Instrumentation level: IR entry_first = 0 Total functions: 48970 Maximum function count: 62070879 Maximum internal block count: 83221158 Top 20 functions with the largest internal block counts: drivers/tty/n_tty.c:n_tty_write, max count = 83221158 rcu_read_unlock_strict, max count = 62070879 _cond_resched, max count = 25486882 rcu_all_qs, max count = 25451477 drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576 _raw_spin_unlock_irqrestore, max count = 18874121 drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624 _raw_spin_lock_irqsave, max count = 18509161 memchr, max count = 15525452 _raw_spin_lock, max count = 15484254 __mod_memcg_state, max count = 14604619 __mod_memcg_lruvec_state, max count = 14602783 fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424 __mod_lruvec_state, max count = 12527154 __mod_node_page_state, max count = 12525172 native_sched_clock, max count = 8904692 sched_clock_cpu, max count = 8895832 sched_clock, max count = 8894627 kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031 fpregs_assert_state_consistent, max count = 8287198 -- Thanks, ~Nick Desaulniers-- You received this message because you are subscribed to the Google Groups "Clang Built Linux" group. To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116001324.2865-1-nick.desaulniers%40gmail.com.