Thread (25 messages) 25 messages, 4 authors, 2013-04-02

RE: [PATCH V4] powerpc/85xx: Add machine check handler to fix PCIe erratum on mpc85xx

From: Jia Hongtao-B38951 <hidden>
Date: 2013-03-06 08:29:06

-----Original Message-----
From: Wood Scott-B07421
Sent: Wednesday, March 06, 2013 2:48 AM
To: Jia Hongtao-B38951
Cc: Wood Scott-B07421; Stuart Yoder; linuxppc-dev@lists.ozlabs.org; Kumar
Gala
Subject: Re: [PATCH V4] powerpc/85xx: Add machine check handler to fix
PCIe erratum on mpc85xx
=20
On 03/05/2013 04:12:30 AM, Jia Hongtao-B38951 wrote:
quoted
quoted
-----Original Message-----
From: Wood Scott-B07421
Sent: Tuesday, March 05, 2013 7:46 AM
To: Stuart Yoder
Cc: Jia Hongtao-B38951; linuxppc-dev@lists.ozlabs.org; Kumar Gala
Subject: Re: [PATCH V4] powerpc/85xx: Add machine check handler to
fix
quoted
PCIe erratum on mpc85xx

On 03/04/2013 10:16:10 AM, Stuart Yoder wrote:
quoted
On Mon, Mar 4, 2013 at 2:40 AM, Jia Hongtao [off-list ref]
wrote:
quoted
A PCIe erratum of mpc85xx may causes a core hang when a link of
PCIe
quoted
quoted
quoted
goes down. when the link goes down, Non-posted transactions
issued
quoted
quoted
quoted
via the ATMU requiring completion result in an instruction
stall.
quoted
quoted
quoted
At the same time a machine-check exception is generated to the
core
quoted
quoted
quoted
to allow further processing by the handler. We implements the
handler
quoted
which skips the instruction caused the stall.
Can you explain at a high level how just skipping an instruction
solves
anything?   If you just skip a load/store and continue like
nothing is
quoted
quoted
wrong, isn't your system possibly in a really bad state.
If the instruction was a load, we probably at least want to fill the
destination register with 0xffffffff or similar.
You discuss this with Liu Shuo about a year ago.
here is the log:

"
On 02/01/2012 02:18 AM, shuo.liu@freescale.com wrote:
quoted
v3 : Skip the instruction only. Don't access the user space memory
in
quoted
     mechine check.
It may be the least bad option for now, but be aware that there's a
small chance that this will cause a leak of sensitive information
(such as a piece of a crypto key that happened to be sitting in the
register to be loaded into).
=20
Yes, that's (one reason) why you'd want to fill in a known value.  Note
the "for now". :-)
=20
-Scott
I think there is no overwhelming reason to fill the destination register
with 0xffffffff.=20

There's a small chance that 0xffffffff is treated as regular data rather
than an error sign.

Also setting this register may influence the user space under certain
circumstance.

So I think just ignore the skipped instruction is an acceptable option for
this fix.

-Hongtao.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help