Thread (3 messages) 3 messages, 3 authors, 2019-06-29

Re: [PATCH] powerpc/64s/exception: Fix machine check early corrupting AMR

From: Michael Ellerman <mpe@ellerman.id.au>
Date: 2019-06-29 12:27:36

Nicholas Piggin [off-list ref] writes:
The early machine check runs in real mode, so locking is unnecessary.
Worse, the windup does not restore AMR, so this can result in a false
KUAP fault after a recoverable machine check hits inside a user copy
operation.

Fix this similarly to HMI by just avoiding the kuap lock in the
early machine check handler (it will be set by the late handler that
runs in virtual mode if that runs). If the virtual mode handler is
reached, it will lock and restore the AMR.
For the archives, this is how I tested this.

Build with KUAP enabled, disassemble load_elf_binary(), in there is a
call to __copy_tofrom_user(), preceded by a write to AMR, eg:

c00000000045eec8:	a6 03 3d 7d 	mtspr   29,r9
c00000000045eecc:	2c 01 00 4c 	isync
c00000000045eed0:	78 93 44 7e 	mr      r4,r18
c00000000045eed4:	78 e3 83 7f 	mr      r3,r28
c00000000045eed8:	b1 c1 c3 4b 	bl      c00000000009b088 <__copy_tofrom_user+0x8>


Boot mambo using skiboot.tcl, break into the mambo shell. Add a
breakpoint at the branch to __copy_tofrom_user():

  systemsim % b 0xc00000000045eed8
  breakpoint set at [0:0:0]: 0xc00000000045eed8 (0xC00000000045EED8) Enc:0x00000000 : INVALID

Continue, run `ls` in the system shell and it should break at your breakpoint:

  systemsim % c
  4439260000000: [0:0]: (PC:0x00007FFFB43B2F00) :      2.1 Mega-Inst/Sec :      2.1 Mega-Cycles/Sec [1 Zaps  0 PA-Zaps] *ON*  [0:0] pri=4 extra=0
  4440009381609: (7208208132): # ls
  [0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl      $-0x3C3E50
  INFO: 4440936223969: (8135050536): ** Execution stopped: user (tcl),  **
  4440936223969: ** finished running 8135050536 instructions **

Print the AMR, it has been cleared:

  systemsim % p amr
  0x0000000000000000

Then inject a machine check exception, and continue:
  systemsim % exc_mce
  systemsim % c
  4440936231861: (8135058428): [ 8673.510176] Disabling lock debugging due to kernel taint
  4440936246871: (8135073438): [ 8673.510205] MCE: CPU0: machine check (Warning) Host TLB Multihit [Recovered]
  4440936266680: (8135093247): [ 8673.510244] MCE: CPU0: NIP: [c00000000045eed8] load_elf_binary+0xef8/0x1970
  4440936282657: (8135109224): [ 8673.510275] MCE: CPU0: Probable Software error (some chance of hardware cause)
  [0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl      $-0x3C3E50
  INFO: 4440936296116: (8135122683): ** Execution stopped: user (tcl),  **
  4440936296116: ** finished running 8135122683 instructions **

Now we're back at our breakpoint. Continue again and we should get an
oops due to a bad AMR fault:

  systemsim % c
  4440936301692: (8135128259): [ 8673.510312] ------------[ cut here ]------------
  4440936321016: (8135147583): [ 8673.510336] Bug: Write fault blocked by AMR!
  4440936331347: (8135157914): [ 8673.510350] WARNING: CPU: 0 PID: 95 at arch/powerpc/include/asm/book3s/64/kup-radix.h:102 __do_page_fault+0x604/0xe60
  4440936352510: (8135179077): [ 8673.510410] Modules linked in:
  4440936365222: (8135191789): [ 8673.510436] CPU: 0 PID: 95 Comm: ls Tainted: G   M              5.2.0-rc2-gcc-8.2.0 #273
  4440936383775: (8135210342): [ 8673.510473] NIP:  c0000000000716b4 LR: c0000000000716b0 CTR: c000000000ca88b0
  4440936401995: (8135228562): [ 8673.510508] REGS: c0000000ec883530 TRAP: 0700   Tainted: G   M               (5.2.0-rc2-gcc-8.2.0)
  4440936430641: (8135257208): [ 8673.510545] MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28002422  XER: 20040000
  4440936498754: (8135325321): [ 8673.510597] CFAR: c00000000011b8e4 IRQMASK: 1 
  4440936505159: (8135331726): [ 8673.510597] GPR00: c0000000000716b0 c0000000ec8837c0 c0000000015f4900 0000000000000020 
  4440936515814: (8135342381): [ 8673.510597] GPR04: c000000001824550 0000000000000000 746c756166206574 64656b636f6c6220 
  4440936528594: (8135355161): [ 8673.510597] GPR08: 00000000fed30000 c000000001130de8 0000000000000000 9000000030001033 
  4440936541374: (8135367941): [ 8673.510597] GPR12: 0000000000002000 c0000000018e0000 0000000080000000 00007fffe2e3de09 
  4440936554154: (8135380721): [ 8673.510597] GPR16: c000000000ed2c50 0000000010000000 c000000000ed2c50 00000000100d3648 
  4440936564809: (8135391376): [ 8673.510597] GPR20: c0000000f0968b00 00000000100e3648 00007fff930a0000 0000000002000000 
  4440936577589: (8135404156): [ 8673.510597] GPR24: 0000000002000000 c0000000ee830600 0000000000000301 00007fffe2e3de09 
  4440936590369: (8135416936): [ 8673.510597] GPR28: 0000000000000000 000000000a000000 0000000000000000 c0000000ec883900 
  4440936611699: (8135438266): [ 8673.510918] NIP [c0000000000716b4] __do_page_fault+0x604/0xe60
  4440936628747: (8135455314): [ 8673.510951] LR [c0000000000716b0] __do_page_fault+0x600/0xe60
  4440936642325: (8135468892): [ 8673.510978] Call Trace:
  4440936655614: (8135482181): [ 8673.511000] [c0000000ec8837c0] [c0000000000716b0] __do_page_fault+0x600/0xe60 (unreliable)
  4440936677874: (8135504441): [ 8673.511045] [c0000000ec883890] [c00000000000b0d4] handle_page_fault+0x18/0x38
  4440936700658: (8135527225): [ 8673.511091] --- interrupt: 301 at __copy_tofrom_user_power7+0x230/0x7ac
  4440936709188: (8135535755): [ 8673.511091]     LR = load_elf_binary+0xefc/0x1970
  4440936728082: (8135554649): [ 8673.511142] [c0000000ec883b90] [c00000000045ee80] load_elf_binary+0xea0/0x1970 (unreliable)
  4440936750368: (8135576935): [ 8673.511187] [c0000000ec883c90] [c0000000003d2f88] search_binary_handler.part.12+0xb8/0x2b0
  4440936772446: (8135599013): [ 8673.511230] [c0000000ec883d20] [c0000000003d3934] __do_execve_file.isra.14+0x684/0xa10
  4440936793891: (8135620458): [ 8673.511272] [c0000000ec883df0] [c0000000003d41b8] sys_execve+0x38/0x50
  4440936813829: (8135640396): [ 8673.511311] [c0000000ec883e20] [c00000000000bdf4] system_call+0x5c/0x70
  4440936828817: (8135655384): [ 8673.511340] Instruction dump:
  4440936848134: (8135674701): [ 8673.511361] 60000000 2fb70000 e93f0168 419e0620 2fa90000 409cfba4 3c82ff8e 38846b88 
  4440936874244: (8135700811): [ 8673.511412] 3c62ff8e 38636c98 480aa1d1 60000000 <0fe00000> e80100e0 3b80000b eae10088 
  4440936891327: (8135717894): [ 8673.511464] ---[ end trace 0698ac8ff1068918 ]---
  4440938377906: (8137204473): Segmentation fault


Apply the fix, retest, and no oops is seen.

cheers
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help