Thread (58 messages) 58 messages, 12 authors, 2020-07-08

Re: [PATCH 18/18] arm64: lto: Strengthen READ_ONCE() to acquire when CLANG_LTO=y

From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2020-07-06 19:42:43
Also in: linux-alpha, lkml, virtualization

On Mon, Jul 06, 2020 at 09:23:26PM +0200, Marco Elver wrote:
On Mon, 6 Jul 2020 at 20:35, Will Deacon [off-list ref] wrote:
quoted
On Mon, Jul 06, 2020 at 05:00:23PM +0100, Dave Martin wrote:
quoted
On Thu, Jul 02, 2020 at 08:23:02AM +0100, Will Deacon wrote:
quoted
On Wed, Jul 01, 2020 at 06:07:25PM +0100, Dave P Martin wrote:
quoted
Also, can you illustrate code that can only be unsafe with Clang LTO?
I don't have a concrete example, but it's an ongoing concern over on the LTO
thread [1], so I cooked this to show one way we could deal with it. The main
concern is that the whole-program optimisations enabled by LTO may allow the
compiler to enumerate possible values for a pointer at link time and replace
an address dependency between two loads with a control dependency instead,
defeating the dependency ordering within the CPU.
Why can't that happen without LTO?
It could, but I'd argue that it's considerably less likely because there
is less information available to the compiler to perform these sorts of
optimisations. It also doesn't appear to be happening in practice.

The current state of affairs is that, if/when we catch the compiler
performing harmful optimistations, we look for a way to disable them.
However, there are good reasons to enable LTO, so this is one way to
do that without having to worry about the potential impact on dependency
ordering.
If it's of any help, I'll see if we can implement that warning in LLVM
if data dependencies somehow disappear (although I don't have any
cycles to pursue right now myself). Until then, short of manual
inspection or encountering a bug in the wild, there is no proof any of
this happens or doesn't happen.

Also, as some anecdotal evidence it's extremely unlikely, even with
LTO: looking at the passes that LLVM runs, there are a number of
passes that seem to want to eliminate basic blocks, thereby getting
rid of branches. Intuitively, it makes sense, because branches are
expensive on most architectures (for GPU targets, I think it tries
even harder to get rid of branches). If we extend our reasoning and
assumptions of LTO's aggressiveness in that direction, we might
actually end up with fewer branches. That might be beneficial for the
data dependencies we worry about (but not so much for control
dependencies we want to keep). Still, no point in speculating (no pun
intended) until we have hard data what actually happens. :-)
Anything along these lines would be very welcome!!!

							Thanx, Paul

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help