Thread (212 messages) 212 messages, 19 authors, 2020-09-10

Re: [PATCH 00/22] add support for Clang LTO

From: Peter Zijlstra <peterz@infradead.org>
Date: 2020-06-25 08:24:59
Also in: linux-arch, linux-kbuild, linux-pci, lkml

On Thu, Jun 25, 2020 at 10:03:13AM +0200, Peter Zijlstra wrote:
On Wed, Jun 24, 2020 at 02:31:36PM -0700, Nick Desaulniers wrote:
quoted
On Wed, Jun 24, 2020 at 2:15 PM Peter Zijlstra [off-list ref] wrote:
quoted
On Wed, Jun 24, 2020 at 01:31:38PM -0700, Sami Tolvanen wrote:
quoted
This patch series adds support for building x86_64 and arm64 kernels
with Clang's Link Time Optimization (LTO).

In addition to performance, the primary motivation for LTO is to allow
Clang's Control-Flow Integrity (CFI) to be used in the kernel. Google's
Pixel devices have shipped with LTO+CFI kernels since 2018.

Most of the patches are build system changes for handling LLVM bitcode,
which Clang produces with LTO instead of ELF object files, postponing
ELF processing until a later stage, and ensuring initcall ordering.

Note that first objtool patch in the series is already in linux-next,
but as it's needed with LTO, I'm including it also here to make testing
easier.
I'm very sad that yet again, memory ordering isn't addressed. LTO vastly
increases the range of the optimizer to wreck things.
Hi Peter, could you expand on the issue for the folks on the thread?
I'm happy to try to hack something up in LLVM if we check that X does
or does not happen; maybe we can even come up with some concrete test
cases that can be added to LLVM's codebase?
I'm sure Will will respond, but the basic issue is the trainwreck C11
made of dependent loads.

Anyway, here's a link to the last time this came up:

  https://lore.kernel.org/linux-arm-kernel/20171116174830.GX3624@linux.vnet.ibm.com/ (local)
Another good read:

  https://lore.kernel.org/lkml/20150520005510.GA23559@linux.vnet.ibm.com/ (local)

and having (partially) re-read that, I now worry intensily about things
like latch_tree_find(), cyc2ns_read_begin, __ktime_get_fast_ns().

It looks like kernel/time/sched_clock.c uses raw_read_seqcount() which
deviates from the above patterns by, for some reason, using a primitive
that includes an extra smp_rmb().

And this is just the few things I could remember off the top of my head,
who knows what else is out there.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help