Thread (99 messages) 99 messages, 10 authors, 2021-06-14

Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

From: Peter Zijlstra <peterz@infradead.org>
Date: 2021-06-12 20:25:41
Also in: linux-kbuild, lkml

On Sat, Jun 12, 2021 at 12:10:03PM -0700, Bill Wendling wrote:
quoted
You're modifying a lot of x86 files, you don't think it's good to let us
know?  Worse, afaict this -fprofile-generate changes code generation,
and we definitely want to know about that.
I got the list of people to add from the scripts/get_maintainer.pl.
$ ./scripts/get_maintainer.pl -f arch/x86/Makefile
Thomas Gleixner [off-list ref] (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Ingo Molnar [off-list ref] (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Borislav Petkov [off-list ref] (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
there's one intel people CC'ed, but he didn't sign off on it.
Intel does not employ the main x86 maintainers, even it if did, mailing
a random Google person won't get the mail to you either, would it?
These patches were available for review for months now,
Which doesn't help if you don't Cc the right people, does it. *nobody*
has time to read LKML.
and posted to all of the lists and CC'ed to the people from
scripts/get_maintainers.pl. Perhaps that program should be improved?
I suspect operator error, see above.
quoted
Supposedly -fprofile-generate adds instrumentation to the generated
code. noinstr *MUST* disable that. If not, this is a complete
non-starter for x86.
"noinstr" has "notrace", which is defined as
"__attribute__((__no_instrument_function__))", which is honored by
both gcc and clang.
Yes it is, but is that sufficient in this case? It very much isn't for
KASAN, UBSAN, and a whole host of other instrumentation crud. They all
needed their own 'bugger-off' attributes.
quoted
We've got KCOV and GCOV support already. Coverage is also not an
argument mentioned anywhere else. Coverage can go pound sand, we really
don't need a third means of getting that.
Those aren't useful for clang-based implementations. And I like to
look forward to potential improvements.
I look forward to less things doing the same over and over. The obvious
solution if of course to make clang use what we have, not the other way
around.
quoted
Do you have actual numbers that back up the sampling vs instrumented
argument? Having the instrumentation will affect performance which can
scew the profile just the same.
Instrumentation counts the number of times a branch is taken. Sampling
is at a gross level, where if the sampling time is fine enough, you
can get an idea of where the hot spots are, but it won't give you the
fine-grained information that clang finds useful. Essentially, while
sampling can "capture the hot spots very well", relying solely on
sampling is basically leaving optimization on the floor.

Our optimizations experts here have determined, through data of
course, that instrumentation is the best option for PGO.
It would be very good to post some of that data and explicit examples.
Hear-say don't carry much weight.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help