Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure
From: Peter Zijlstra <peterz@infradead.org>
Date: 2021-06-12 20:25:41
Also in:
linux-kbuild, lkml
On Sat, Jun 12, 2021 at 12:10:03PM -0700, Bill Wendling wrote:
quoted
You're modifying a lot of x86 files, you don't think it's good to let us know? Worse, afaict this -fprofile-generate changes code generation, and we definitely want to know about that.I got the list of people to add from the scripts/get_maintainer.pl.
$ ./scripts/get_maintainer.pl -f arch/x86/Makefile Thomas Gleixner [off-list ref] (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)) Ingo Molnar [off-list ref] (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)) Borislav Petkov [off-list ref] (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)) x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
there's one intel people CC'ed, but he didn't sign off on it.
Intel does not employ the main x86 maintainers, even it if did, mailing a random Google person won't get the mail to you either, would it?
These patches were available for review for months now,
Which doesn't help if you don't Cc the right people, does it. *nobody* has time to read LKML.
and posted to all of the lists and CC'ed to the people from scripts/get_maintainers.pl. Perhaps that program should be improved?
I suspect operator error, see above.
quoted
Supposedly -fprofile-generate adds instrumentation to the generated code. noinstr *MUST* disable that. If not, this is a complete non-starter for x86."noinstr" has "notrace", which is defined as "__attribute__((__no_instrument_function__))", which is honored by both gcc and clang.
Yes it is, but is that sufficient in this case? It very much isn't for KASAN, UBSAN, and a whole host of other instrumentation crud. They all needed their own 'bugger-off' attributes.
quoted
We've got KCOV and GCOV support already. Coverage is also not an argument mentioned anywhere else. Coverage can go pound sand, we really don't need a third means of getting that.Those aren't useful for clang-based implementations. And I like to look forward to potential improvements.
I look forward to less things doing the same over and over. The obvious solution if of course to make clang use what we have, not the other way around.
quoted
Do you have actual numbers that back up the sampling vs instrumented argument? Having the instrumentation will affect performance which can scew the profile just the same.Instrumentation counts the number of times a branch is taken. Sampling is at a gross level, where if the sampling time is fine enough, you can get an idea of where the hot spots are, but it won't give you the fine-grained information that clang finds useful. Essentially, while sampling can "capture the hot spots very well", relying solely on sampling is basically leaving optimization on the floor. Our optimizations experts here have determined, through data of course, that instrumentation is the best option for PGO.
It would be very good to post some of that data and explicit examples. Hear-say don't carry much weight.