[PATCH v4 2/7] arm64: introduce interfaces to hotpatch kernel and module code

From: Jiang Liu <hidden>
Date: 2013-10-18 15:09:05
Also in: lkml

On 10/18/2013 09:44 PM, Jon Medhurst (Tixy) wrote:

On Fri, 2013-10-18 at 09:56 +0100, Will Deacon wrote:

quoted

Hi Tixy,

On Thu, Oct 17, 2013 at 04:24:01PM +0100, Jon Medhurst (Tixy) wrote:

quoted

On Thu, 2013-10-17 at 12:38 +0100, Will Deacon wrote:

quoted

On Thu, Oct 17, 2013 at 07:19:35AM +0100, Jiang Liu wrote:

quoted

+	/*
+	 * Execute __aarch64_insn_patch_text() on every online CPU,
+	 * which ensure serialization among all online CPUs.
+	 */
+	return stop_machine(aarch64_insn_patch_text_cb, &patch, NULL);
+}

Whoa, whoa, whoa! The comment here is wrong -- we only run the patching on
*one* CPU, which is the right thing to do. However, the arch/arm/ call to
stop_machine in kprobes does actually run the patching code on *all* the
online cores (including the cache flushing!). I think this is to work around
cores without hardware cache maintenance broadcasting, but that could easily
be called out specially (like we do in patch.c) and the flushing could be
separated from the patching too.

[...]

For code modifications done in 32bit ARM kprobes (and ftrace) I'm not
sure we ever actually resolved the possible cache flushing issues. If
there was specific reasons for flushing on all cores I can't remember
them, sorry. I have a suspicion that doing so was a case of sticking
with what the code was already doing, and flushing on all cores seemed
safest to guard against problems we hadn't thought about.

[...]

quoted

Sorry, I don't think I've added much light on things here have I?

I think you missed the bit I was confused about :) Flushing the cache on
each core is necessary if cache_ops_need_broadcast, so I can understand why
you'd have code to do that. The bit I don't understand is that you actually
patch the instruction on each core too!

This is only happens when removing a kprobe with __arch_disarm_kprobe().
We can't just use the intelligent patch_text() function there because we
want to always force stop machine to be used as this prevents the case
where a CPU a hits the probe, starts executing it's handler then another
CPU whips away the probe from under it.

That explains why we use stop_machine, but not why all CPU's must modify
the instruction. I think it's a case of just that it's simpler to do
that unconditionally rather than add extra code for the
cache_ops_need_broadcast() case. I mean, stop_machine() is a sledge
hammer, which stalls the whole system until the next scheduler tick, and
then gets every CPU to busy wait, so there's not much incentive to try
and optimise the code to avoid a memory write + cacheline flush on each
core.

This reminds me, I'm sure I heard rumours quite some time ago that Paul
McKenney was thinking of trying to do away with stop_machine...?

I remember McKenney has tried that, don't know about the progress now.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help