Thread (7 messages) 7 messages, 3 authors, 2013-08-27

Re: Critical Interrupt Input

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: 2013-08-27 22:14:33

On Tue, 2013-08-27 at 15:11 -0700, Henry Bausley wrote:
Both methods you described seem to work. We are currently using the
method of clearing the partially written TLB. Seems to be working but
we're still testing.  Thanks.  
Feel free to send me us patch for review :-)

Cheers,
Ben.
.
.
.
	mfspr	r5,SPRN_CSRR0;
	lis	r12,finish_tlb_load_44x@h
	ori	r12,r12,finish_tlb_load_44x@l;
	addi	r11,r12,finish_tlb_load_44x_end-finish_tlb_load_44x;
	cmplw	cr0,r5,r12;
	cmplw	cr1,r5,r11;
	ble	cr0,3f;
	bge	cr1,3f;
	li	r12,0;
	mr	r5,r11
	tlbwe	r12,r13,PPC44x_TLB_XLAT;
	tlbwe	r12,r13,PPC44x_TLB_PAGEID;	/* Clear PAGEID */
        tlbwe   r12,r13,PPC44x_TLB_ATTRIB;	/* Clear ATTRIB */
	isync
.
.
.


On Wed, 2013-08-21 at 09:08 +1000, Benjamin Herrenschmidt wrote:
quoted
On Tue, 2013-08-20 at 15:48 -0700, Henry Bausley wrote:
quoted
Ben,


After your hints I suspected the read of a real world i/o variable *piom
which came from ioremap_nocache in the 3 line critical interrupt handler

void critintr_handler(void *dev)
{
  critintrcount++;          // increment a variable
  iodata = *piom;           // read an I/O location 
  mtdcr(0x0c0, 0x00002000); // clear critical interrupt 
} 

is what caused the problem. Commenting it out seems to make the system stable.  
Right, definitely would do that. BTW. You may want to use proper IO
accessors while at it, to get the right memory barriers etc...
quoted
This led us to disable the critical interrupt when in the
DataTLBError44x and InstructionTLBError44x exceptions.  Now the critical
interrupt handler seems to make things more stable when reading real
world i/o for our application.


  /* Data TLB Error Interrupt */
  START_EXCEPTION(DataTLBError44x)
  mtspr	SPRN_SPRG_WSCRATCH0, r10  /* Save some working */
+  mfmsr r10                      /*  Disable the */
+  rlwinm r10,r10,0,15,13         /*  MSR's CE bit */
+  mtmsr r10                     


Do you see any potential problems with this approach?

If so can you advise us on how to better take care of this.
 - You potentially still have an exposure ... between the mtspr to
scratch and the mfmsr, a CRIC can occur, causing a re-entrancy which
would than clobber the scratch register. That can be handled by saving
that scratc SPRG into the stack frame on entry/exit from the crit
interrupt. Look at crit_transfer_to_handler, how it already handles
MMUCR:

	mfspr	r0,SPRN_MMUCR
	stw	r0,MMUCR(r11)

Probably add saving of the SPRG_WSCRATCH0 in there (need to add a frame
slot for it) and do the restore in RESTORE_MMU_REGS

 - You need to handle Instructions TLB miss as well

 - You add overhead to the TLB miss handlers which are fairly
performance critical pieces of code. You might be able to alleviate
that by making the whole thing support re-entrancy properly but that's
harder. To do that you would have to:

    * Save *all* the SPRGs used by the TLB miss during crit entry/exit

    * Detect in crit_transfer_to_handler (check the CSRR0 bounds) that 
      the crit code interrupted finish_tlb_load_44x before or at the
      last tlbwe instruction. In that case, immediately clear the 
      partially written TLB entry (index in r13) and change the
      return address to skip right past the last tlbwe.

Cheers,
Ben.

quoted













On Tue, 2013-08-20 at 06:56 +1000, Benjamin Herrenschmidt wrote:
quoted
On Mon, 2013-08-19 at 12:00 -0700, Henry Bausley wrote:
quoted
Support does appear to be present but there is a problem returning
back to user space I suspect.
Probably a problem with TLB misses vs. crit interrupts.

A critical interrupt can re-enter a TLB miss.

I can see two potential issues there:

 - A bug where we don't properly restore "something" (I thought we did
save and restore MMUCR though, but that's worth dbl checking if it works
properly) accross the crit entry/exit

 - Something in your crit code causing a TLB miss (the
kernel .text/.data/.bss should be bolted but anything else can). We
don't currently support re-entering the TLB miss that way.

If we were to support the latter, we'd need to detect on entering a crit
that the PC is within the TLB miss handler, and setup a return context
to the original instruction (replay the miss) rather than trying to
resume it..

Cheers,
Ben.
quoted
What fails is it causes Linux user space programs to get Segmentation
errors.
Issuing a simple ls causes a segmentation fault sometimes.  The shell
gets terminated 
and you cannot log back in.  INIT: Id "T0" respawning too fast:
disabled for 5 minutes pops up.

However, the critical interrupt handler keeps running.  I know this by
adding the reading 
of a physical I/O location in the handler and can see it is being read
on the scope.


The only code in the handler is below.

void critintr_handler(void *dev)
{
  critintrcount++;          // increment a variable
  iodata = *piom;           // read an I/O location 
  mtdcr(0x0c0, 0x00002000); // clear critical interrupt
}


Below is a log of the type of crashes that occur:

root@10.34.9.213:/opt/ppmac/ktest# ls
Segmentation fault
root@10.34.9.213:/opt/ppmac/ktest# ls
Segmentation fault
root@10.34.9.213:/opt/ppmac/ktest# ls
Makefile        ktest.c    ktest.ko     ktest.mod.o  modules.order
Module.symvers  ktest.cbp  ktest.mod.c  ktest.o
root@10.34.9.213:/opt/ppmac/ktest# ls

Debian GNU/Linux 7 powerpmac ttyS0

powerpmac login: root

Debian GNU/Linux 7 powerpmac ttyS0

powerpmac login: root

Debian GNU/Linux 7 powerpmac ttyS0

powerpmac login: root

Debian GNU/Linux 7 powerpmac ttyS0

powerpmac login: root
Password: 
Last login: Thu Nov 30 20:42:16 UTC 1933 on ttyS0
Linux powerpmac 3.2.21-aspen_2.01.09 #10 Mon Aug 19 08:49:12 PDT 2013
ppc

The programs included with the Debian GNU/Linux system are free
software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
INIT: Id "T0" respawning too fast: disabled for 5 minutes


______________________________________________________________________
From: "Benjamin Herrenschmidt" <benh@kernel.crashing.org>
Sent: Saturday, August 17, 2013 3:05 PM
To: "Kumar Gala" <redacted>
Cc: linuxppc-dev@lists.ozlabs.org, hbausley@deltatau.com
Subject: Re: Critical Interrupt Input

On Fri, 2013-08-16 at 06:04 -0500, Kumar Gala wrote:
quoted
The 44x low level code needs to handle exception stacks properly for
this to work. Since its possible to have a critical exception occur
while in a normal exception level, you have to have proper saving of
additional register state and a stack frame for the critical
exception, etc. I'm not sure if that was ever done for 44x.
Don't 44x and FSL BookE share the same macros ? I would think 44x does
indeed implement the same crit support as e500...

What does the crash look like ?

Ben.


_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


  ­­  




Outbound scan for Spam or Virus by Barracuda at Delta Tau

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help