Thread (99 messages) 99 messages, 10 authors, 2021-06-14

Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

From: Fangrui Song <hidden>
Date: 2021-06-12 20:21:42
Also in: linux-doc, lkml

On 2021-06-12, Peter Zijlstra wrote:
On Sat, Jun 12, 2021 at 10:25:57AM -0700, Bill Wendling wrote:
quoted
On Sat, Jun 12, 2021 at 9:59 AM Peter Zijlstra [off-list ref] wrote:
quoted
Also, and I don't see this answered *anywhere*, why are you not using
perf for this? Your link even mentions Sampling Profilers (and I happen
to know there's been significant effort to make perf output work as
input for the PGO passes of the various compilers).
Instruction-based (non-sampling) profiling gives us a better
context-sensitive profile, making PGO more impactful. It's also useful
for coverage whereas sampling profiles cannot.
We've got KCOV and GCOV support already. Coverage is also not an
argument mentioned anywhere else. Coverage can go pound sand, we really
don't need a third means of getting that.

Do you have actual numbers that back up the sampling vs instrumented
argument? Having the instrumentation will affect performance which can
scew the profile just the same.

Also, sampling tends to capture the hot spots very well.
[I don't do kernel development. My experience is user-space toolchain.]

For applications, I think instrumentation based PGO can be 1%~4% faster
than sample-based PGO (e.g. AutoFDO) on x86.

Sample-based PGO has CPU requirement (e.g. Performance Monitoring Unit).
(my gut feeling is that there may be larger gap between instrumentation
based PGO and sample-based PGO for aarch64/ppc64, even though they can
use sample-based PGO.)
Instrumentation based PGO can be ported to more architectures.

In addition, having an infrastructure for instrumentation based PGO
makes it easy to deploy newer techniques like context-sensitive PGO
(just changed compile options; it doesn't need new source level
annotation).
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help