Re: [PATCH 21/35] arm64: mte: Add in-kernel tag fault handler | linux-arm-kernel

quoted

On Thu, Aug 27, 2020 at 02:31:23PM +0200, Andrey Konovalov wrote:
On Thu, Aug 27, 2020 at 11:54 AM Catalin Marinas
[off-list ref] wrote:
On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
+static int do_tag_recovery(unsigned long addr, unsigned int esr,
+                        struct pt_regs *regs)
+{
+     report_tag_fault(addr, esr, regs);
+
+     /* Skip over the faulting instruction and continue: */
+     arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
Ooooh, do we expect the kernel to still behave correctly after this? I
thought the recovery means disabling tag checking altogether and
restarting the instruction rather than skipping over it.
The intention is to be able to catch multiple MTE faults without
panicking or disabling MTE when executing KASAN tests (those do
multiple bad accesses one after another).
The problem is that for MTE synchronous tag check faults, the access has
not happened, so you basically introduce memory corruption by skipping
the access.

We do arm64_skip_faulting_instruction() for software tag-based KASAN
too, it's not ideal, but works for testing purposes.
IIUC, KASAN only skips over the brk instruction which doesn't have any
other side-effects. Has the actual memory access taken place when it
hits the brk?

Can we disable MTE, reexecute the instruction, and then reenable MTE,
or something like that?
If you want to preserve the MTE enabled, you could single-step the
instruction or execute it out of line, though it's a bit more convoluted
(we have a similar mechanism for kprobes/uprobes).

Another option would be to attempt to set the matching tag in memory,
under the assumption that it is writable (if it's not, maybe it's fine
to panic). Not sure how this interacts with the slub allocator since,
presumably, the logical tag in the pointer is wrong rather than the
allocation one.

Yet another option would be to change the tag in the register and
re-execute but this may confuse the compiler.

When running in-kernel MTE in production, we'll either panic or
disable MTE after the first fault. This was controlled by the
panic_on_mte_fault option Vincenzo initially had.
I prefer to disable MTE, print something and continue, but no panic.

We only skip if we emulated it.
I'm not sure I understand this part, what do you mean by emulating?
Executing it out of line or other form of instruction emulation (see
arch/arm64/kernel/probes/simulate-insn.c) so that the access actually
takes place. But you can single-step or experiment with some of the
other tricks above.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help