Thread (6 messages) 6 messages, 4 authors, 2021-11-02

Re: [PATCH] powerpc/32e: Ignore ESR in instruction storage interrupt handler

From: Nicholas Piggin <npiggin@gmail.com>
Date: 2021-10-29 00:50:58

Excerpts from Daniel Axtens's message of October 29, 2021 8:13 am:
Hi Nick,
quoted
A e5500 machine running a 32-bit kernel sometimes hangs at boot,
seemingly going into an infinite loop of instruction storage interrupts.
The ESR SPR has a value of 0x800000 (store) when this happens, which is
likely set by a previous store. An instruction TLB miss interrupt would
then leave ESR unchanged, and if no PTE exists it calls directly to the
instruction storage interrupt handler without changing ESR.
I hadn't heard of the ESR before and your patch just uses the acronym:
apparently it is the Exception Syndrome Register. In ISA 2.07 it's in
Section 7.2.13 of Book III-E. (e5500 implements 2.06 but I assume it's
roughly the same there.)

The ISA says that ESR_ST is set for the following types of exception:
 - Alignment
 - Data Storage
 - Data TLB
 - LRAT Error

So it makes sense to assume that it was not set by the instruction
storage exception. (I see you have a discussion as to how that might
occur in
https://lore.kernel.org/linuxppc-dev/1635413197.83rhc4s3fc.astroid@bobo.none/#t (local)
and you concluded that in the comment you add that we arrive here via
the TLB handler jumping to the ISI.)
I think that's most likely. I don't know book E arch very well, *maybe* 
you could have an ISI which does not change ESR without violating the
letter of the architecture, but not sure. Either way, this fix should 
take care of both.
quoted
access_error() does not cause a segfault due to a store to a read-only
vma because is_exec is true. Most subsequent fault handling does not
check for a write fault on a read-only vma, and might do strange things
like create a writeable PTE or call page_mkwrite on a read only vma or
file. It's not clear what happens here to cause the infinite faulting in
this case, a fault handler failure or low level PTE or TLB handling.
I'm just trying to unpick this:

 - INSTRUCTION_STORAGE_EXCEPTION ends up branching to do_page_fault ->
   __do_page_fault -> ___do_page_fault

 - ___do_page_fault has is_exec = true because the exception is a
   instruction storage interrupt

 - ___do_page_fault also decides that is_write = true because the
   ESR_DST bit is set.

This puts us in a bad position because we're taking some information
from the current exception and some information from a previous
exception, so it makes sense that things would go wrong from here!
Yeah, we miss the store into read-only vma check because is_exec which 
causes us to go into write path of the core handle_mm_fault. Although 
even if we caught it and segfaulted obviously that's still not the right 
thing to do.
quoted
In any case this can be fixed by having the instruction storage
interrupt zero regs->dsisr rather than storing the ESR value to it.
Is it valid to just ignore the contents of ESR for an Instruction
Storage exception?

The 2.07 ISA at least says that the following ESR bits can be set by an
Instruction Storage exception:
 - Byte Ordering
 - TLB Inelligible
 - Page Table

It sounds from
quoted
+ * In any case, do_page_fault for BOOK3E does not use ESR and always expects
+ * dsisr to be 0. ESR_DST from a prior store in particular would confuse fault
+ * handling.
that we don't actually ever check any of those three bits in the
do_page_fault path. reg_booke.h defines ESR_BO but the definition is
never used, and there is no ESR_TLBI or ESR_PT constant defined.
Yeah that's right. I don't quite know what we would do with the other
exception types. Possibly just check them in page_fault_is_bad and cause 
a sigbus. Maybe there is some low level fixup that would be possible.
But nothing is implemented.
So much as it seems a bit dodgy to me, I guess it is right to say that
we're not changing behaviour by setting dsisr to 0 and just ignoring
those 3 bits? Should we document that in the comment?
The comment does say do_page_fault expects it to be 0. Did you have more 
in mind? I can tweak the comment.
I probably would have masked ESR_DST but I guess extra bit-twiddling in
an interrupt path when zeroing would do is probably not worth it for
theoretical correctness?
Well the ESR is completely un-set so none of the bits make sense at this
point. Reading it is jsut overhead.

If we wanted to do it really "cleanly", the TLB error interrupt handler
might set ESR before calling here, or it might call a few instructions
later with the register already set to a sane ESR value. But that would
really only matter if do_page_fault started doing anything useful with
the other bits.

Thanks,
Nick
Kind regards,
Daniel
quoted
Link: https://lore.kernel.org/linuxppc-dev/1635306738.0z8wt7619v.astroid@bobo.none/ (local)
Fixes: a01a3f2ddbcd ("powerpc: remove arguments from fault handler functions")
Reported-by: Jacques de Laval <redacted>
Tested-by: Jacques de Laval <redacted>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/head_booke.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index e5503420b6c6..ef8d1b1c234e 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -465,12 +465,21 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 	bl	do_page_fault;						      \
 	b	interrupt_return
 
+/*
+ * Instruction TLB Error interrupt handlers may call InstructionStorage
+ * directly without clearing ESR, so the ESR at this point may be left over
+ * from a prior interrupt.
+ *
+ * In any case, do_page_fault for BOOK3E does not use ESR and always expects
+ * dsisr to be 0. ESR_DST from a prior store in particular would confuse fault
+ * handling.
+ */
 #define INSTRUCTION_STORAGE_EXCEPTION					      \
 	START_EXCEPTION(InstructionStorage)				      \
-	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);		      \
-	mfspr	r5,SPRN_ESR;		/* Grab the ESR and save it */	      \
+	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);			      \
+	li	r5,0;			/* Store 0 in regs->esr (dsisr) */    \
 	stw	r5,_ESR(r11);						      \
-	stw	r12, _DEAR(r11);	/* Pass SRR0 as arg2 */		      \
+	stw	r12, _DEAR(r11);	/* Set regs->dear (dar) to SRR0 */    \
 	prepare_transfer_to_handler;					      \
 	bl	do_page_fault;						      \
 	b	interrupt_return
-- 
2.23.0
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help