Re: [PATCH] powerpc/64s/exception: Fix machine check early corrupting AMR
From: Michael Ellerman <mpe@ellerman.id.au>
Date: 2019-06-29 12:27:36
Nicholas Piggin [off-list ref] writes:
The early machine check runs in real mode, so locking is unnecessary. Worse, the windup does not restore AMR, so this can result in a false KUAP fault after a recoverable machine check hits inside a user copy operation. Fix this similarly to HMI by just avoiding the kuap lock in the early machine check handler (it will be set by the late handler that runs in virtual mode if that runs). If the virtual mode handler is reached, it will lock and restore the AMR.
For the archives, this is how I tested this. Build with KUAP enabled, disassemble load_elf_binary(), in there is a call to __copy_tofrom_user(), preceded by a write to AMR, eg: c00000000045eec8: a6 03 3d 7d mtspr 29,r9 c00000000045eecc: 2c 01 00 4c isync c00000000045eed0: 78 93 44 7e mr r4,r18 c00000000045eed4: 78 e3 83 7f mr r3,r28 c00000000045eed8: b1 c1 c3 4b bl c00000000009b088 <__copy_tofrom_user+0x8> Boot mambo using skiboot.tcl, break into the mambo shell. Add a breakpoint at the branch to __copy_tofrom_user(): systemsim % b 0xc00000000045eed8 breakpoint set at [0:0:0]: 0xc00000000045eed8 (0xC00000000045EED8) Enc:0x00000000 : INVALID Continue, run `ls` in the system shell and it should break at your breakpoint: systemsim % c 4439260000000: [0:0]: (PC:0x00007FFFB43B2F00) : 2.1 Mega-Inst/Sec : 2.1 Mega-Cycles/Sec [1 Zaps 0 PA-Zaps] *ON* [0:0] pri=4 extra=0 4440009381609: (7208208132): # ls [0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl $-0x3C3E50 INFO: 4440936223969: (8135050536): ** Execution stopped: user (tcl), ** 4440936223969: ** finished running 8135050536 instructions ** Print the AMR, it has been cleared: systemsim % p amr 0x0000000000000000 Then inject a machine check exception, and continue: systemsim % exc_mce systemsim % c 4440936231861: (8135058428): [ 8673.510176] Disabling lock debugging due to kernel taint 4440936246871: (8135073438): [ 8673.510205] MCE: CPU0: machine check (Warning) Host TLB Multihit [Recovered] 4440936266680: (8135093247): [ 8673.510244] MCE: CPU0: NIP: [c00000000045eed8] load_elf_binary+0xef8/0x1970 4440936282657: (8135109224): [ 8673.510275] MCE: CPU0: Probable Software error (some chance of hardware cause) [0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl $-0x3C3E50 INFO: 4440936296116: (8135122683): ** Execution stopped: user (tcl), ** 4440936296116: ** finished running 8135122683 instructions ** Now we're back at our breakpoint. Continue again and we should get an oops due to a bad AMR fault: systemsim % c 4440936301692: (8135128259): [ 8673.510312] ------------[ cut here ]------------ 4440936321016: (8135147583): [ 8673.510336] Bug: Write fault blocked by AMR! 4440936331347: (8135157914): [ 8673.510350] WARNING: CPU: 0 PID: 95 at arch/powerpc/include/asm/book3s/64/kup-radix.h:102 __do_page_fault+0x604/0xe60 4440936352510: (8135179077): [ 8673.510410] Modules linked in: 4440936365222: (8135191789): [ 8673.510436] CPU: 0 PID: 95 Comm: ls Tainted: G M 5.2.0-rc2-gcc-8.2.0 #273 4440936383775: (8135210342): [ 8673.510473] NIP: c0000000000716b4 LR: c0000000000716b0 CTR: c000000000ca88b0 4440936401995: (8135228562): [ 8673.510508] REGS: c0000000ec883530 TRAP: 0700 Tainted: G M (5.2.0-rc2-gcc-8.2.0) 4440936430641: (8135257208): [ 8673.510545] MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 28002422 XER: 20040000 4440936498754: (8135325321): [ 8673.510597] CFAR: c00000000011b8e4 IRQMASK: 1 4440936505159: (8135331726): [ 8673.510597] GPR00: c0000000000716b0 c0000000ec8837c0 c0000000015f4900 0000000000000020 4440936515814: (8135342381): [ 8673.510597] GPR04: c000000001824550 0000000000000000 746c756166206574 64656b636f6c6220 4440936528594: (8135355161): [ 8673.510597] GPR08: 00000000fed30000 c000000001130de8 0000000000000000 9000000030001033 4440936541374: (8135367941): [ 8673.510597] GPR12: 0000000000002000 c0000000018e0000 0000000080000000 00007fffe2e3de09 4440936554154: (8135380721): [ 8673.510597] GPR16: c000000000ed2c50 0000000010000000 c000000000ed2c50 00000000100d3648 4440936564809: (8135391376): [ 8673.510597] GPR20: c0000000f0968b00 00000000100e3648 00007fff930a0000 0000000002000000 4440936577589: (8135404156): [ 8673.510597] GPR24: 0000000002000000 c0000000ee830600 0000000000000301 00007fffe2e3de09 4440936590369: (8135416936): [ 8673.510597] GPR28: 0000000000000000 000000000a000000 0000000000000000 c0000000ec883900 4440936611699: (8135438266): [ 8673.510918] NIP [c0000000000716b4] __do_page_fault+0x604/0xe60 4440936628747: (8135455314): [ 8673.510951] LR [c0000000000716b0] __do_page_fault+0x600/0xe60 4440936642325: (8135468892): [ 8673.510978] Call Trace: 4440936655614: (8135482181): [ 8673.511000] [c0000000ec8837c0] [c0000000000716b0] __do_page_fault+0x600/0xe60 (unreliable) 4440936677874: (8135504441): [ 8673.511045] [c0000000ec883890] [c00000000000b0d4] handle_page_fault+0x18/0x38 4440936700658: (8135527225): [ 8673.511091] --- interrupt: 301 at __copy_tofrom_user_power7+0x230/0x7ac 4440936709188: (8135535755): [ 8673.511091] LR = load_elf_binary+0xefc/0x1970 4440936728082: (8135554649): [ 8673.511142] [c0000000ec883b90] [c00000000045ee80] load_elf_binary+0xea0/0x1970 (unreliable) 4440936750368: (8135576935): [ 8673.511187] [c0000000ec883c90] [c0000000003d2f88] search_binary_handler.part.12+0xb8/0x2b0 4440936772446: (8135599013): [ 8673.511230] [c0000000ec883d20] [c0000000003d3934] __do_execve_file.isra.14+0x684/0xa10 4440936793891: (8135620458): [ 8673.511272] [c0000000ec883df0] [c0000000003d41b8] sys_execve+0x38/0x50 4440936813829: (8135640396): [ 8673.511311] [c0000000ec883e20] [c00000000000bdf4] system_call+0x5c/0x70 4440936828817: (8135655384): [ 8673.511340] Instruction dump: 4440936848134: (8135674701): [ 8673.511361] 60000000 2fb70000 e93f0168 419e0620 2fa90000 409cfba4 3c82ff8e 38846b88 4440936874244: (8135700811): [ 8673.511412] 3c62ff8e 38636c98 480aa1d1 60000000 <0fe00000> e80100e0 3b80000b eae10088 4440936891327: (8135717894): [ 8673.511464] ---[ end trace 0698ac8ff1068918 ]--- 4440938377906: (8137204473): Segmentation fault Apply the fix, retest, and no oops is seen. cheers