Re: [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry
From: Thomas D. <hidden>
Date: 2015-08-18 17:12:07
Hi, Jiri Slaby wrote:
On 08/18/2015, 12:55 AM, Thomas D wrote:quoted
From: Andy Lutomirski <luto@kernel.org> commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream. Returning to userspace is tricky: IRET can fail, and ESPFIX can rearrange the stack prior to IRET. The NMI nesting fixup relies on a precise stack layout and atomic IRET. Rather than trying to teach the NMI nesting fixup to handle ESPFIX and failed IRET, punt: run NMIs that came from user mode on the normal kernel stack. This will make some nested NMIs visible to C code, but the C code is okay with that. As a side effect, this should speed up perf: it eliminates an RDMSR when NMIs come from user mode. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Borislav Petkov <redacted> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <redacted> Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- arch/x86/kernel/entry_64.S | 77 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 73 insertions(+), 4 deletions(-)diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 28b08345..bd7d8aa 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S@@ -1715,19 +1715,88 @@ ENTRY(nmi) * a nested NMI that updated the copy interrupt stack frame, a * jump will be made to the repeat_nmi code that will handle the second * NMI. + * + * However, espfix prevents us from directly returning to userspace + * with a single IRET instruction. Similarly, IRET to user mode + * can fault. We therefore handle NMIs from user space like + * other IST entries. */ /* Use %rdx as out temp variable throughout */ pushq_cfi %rdx CFI_REL_OFFSET rdx, 0 + testb $3, CS-RIP+8(%rsp) + jz .Lnmi_from_kernel + + /* + * NMI from user mode. We need to run on the thread stack, but we + * can't go through the normal entry paths: NMIs are masked, and + * we don't want to enable interrupts, because then we'll end + * up in an awkward situation in which IRQs are on but NMIs + * are off. + */ + + SWAPGS + cld + movq %rsp, %rdx + movq PER_CPU_VAR(kernel_stack), %rspI think you are wasting stack space here. With kernel_stack, you should add 5*8 (KERNEL_STACK_OFFSET) to the pointer here. I.e. space for 5 registers is pre-reserved at kernel_stack already. (Or use movq instead of the 5 pushq below.) Why don't you re-use the 3.16's version anyway?quoted
+ pushq 5*8(%rdx) /* pt_regs->ss */ + pushq 4*8(%rdx) /* pt_regs->rsp */ + pushq 3*8(%rdx) /* pt_regs->flags */ + pushq 2*8(%rdx) /* pt_regs->cs */ + pushq 1*8(%rdx) /* pt_regs->rip */ + pushq $-1 /* pt_regs->orig_ax */ + pushq %rdi /* pt_regs->di */ + pushq %rsi /* pt_regs->si */ + pushq (%rdx) /* pt_regs->dx */ + pushq %rcx /* pt_regs->cx */ + pushq %rax /* pt_regs->ax */ + pushq %r8 /* pt_regs->r8 */ + pushq %r9 /* pt_regs->r9 */ + pushq %r10 /* pt_regs->r10 */ + pushq %r11 /* pt_regs->r11 */ + pushq %rbx /* pt_regs->rbx */ + pushq %rbp /* pt_regs->rbp */ + pushq %r12 /* pt_regs->r12 */ + pushq %r13 /* pt_regs->r13 */ + pushq %r14 /* pt_regs->r14 */ + pushq %r15 /* pt_regs->r15 */
Mh, so you mean
+ addq $KERNEL_STACK_OFFSET, %rsp
between
+ movq PER_CPU_VAR(kernel_stack), %rsp
and
+ pushq 5*8(%rdx) /* pt_regs->ss */
is missing? That seems to be the only difference between this patch and Debian's 3.16.7-ckt11-1+deb8u2 [1] version. [1] https://anonscm.debian.org/cgit/kernel/linux.git/tree/debian/patches/bugfix/x86/0006-x86-nmi-64-Switch-stacks-on-userspace-NMI-entry.patch?h=jessie#n69 -Thomas