[PATCH 3.16 56/76] x86/syscall: Sanitize syscall table de-references under speculation
From: Ben Hutchings <hidden>
Date: 2018-03-12 03:06:12
Also in:
lkml, stable
3.16.56-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Ben Hutchings <redacted> commit 2fbd7af5af8665d18bcefae3e9700be07e22b681 upstream. The upstream version of this, touching C code, was written by Dan Williams, with the following description:
The syscall table base is a user controlled function pointer in kernel space. Use array_index_nospec() to prevent any out of bounds speculation. While retpoline prevents speculating into a userspace directed target it does not stop the pointer de-reference, the concern is leaking memory relative to the syscall table base, by observing instruction cache behavior.
The x86_64 assembly version for 4.4 was written by Jiri Slaby, with the following description:
In 4.4.118, we have commit c8961332d6da (x86/syscall: Sanitize syscall table de-references under speculation), which is a backport of upstream commit 2fbd7af5af86. But it fixed only the C part of the upstream patch -- the IA32 sysentry. So it ommitted completely the assembly part -- the 64bit sysentry. Fix that in this patch by explicit array_index_mask_nospec written in assembly. The same was used in lib/getuser.S. However, to have "sbb" working properly, we have to switch from "cmp" against (NR_syscalls-1) to (NR_syscalls), otherwise the last syscall number would be "and"ed by 0. It is because the original "ja" relies on "CF" or "ZF", but we rely only on "CF" in "sbb". That means: switch to "jae" conditional jump too. Final note: use rcx for mask as this is exactly what is overwritten by the 4th syscall argument (r10) right after.
In 3.16 the x86_32 syscall table lookup is also written in assembly. So I've taken Jiri's version and added similar masking in entry_32.S, using edx as the temporary. edx is clobbered by SAVE_REGS and seems to be free at this point. Cc: Dan Williams <redacted> Cc: Jiri Slaby <redacted> Cc: Jan Beulich <redacted> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <redacted> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Andy Lutomirski <luto@kernel.org> Cc: alan@linux.intel.com Cc: Jinpu Wang <redacted> Signed-off-by: Ben Hutchings <redacted> ---
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S@@ -426,6 +426,8 @@ sysenter_past_esp: sysenter_do_call: cmpl $(NR_syscalls), %eax jae sysenter_badsys + sbb %edx, %edx /* array_index_mask_nospec() */ + and %edx, %eax call *sys_call_table(,%eax,4) sysenter_after_call: movl %eax,PT_EAX(%esp)
@@ -503,6 +505,8 @@ ENTRY(system_call) cmpl $(NR_syscalls), %eax jae syscall_badsys syscall_call: + sbb %edx, %edx /* array_index_mask_nospec() */ + and %edx, %eax call *sys_call_table(,%eax,4) syscall_after_call: movl %eax,PT_EAX(%esp) # store the return value --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S
@@ -445,12 +445,14 @@ GLOBAL(system_call_after_swapgs) jnz tracesys system_call_fastpath: #if __SYSCALL_MASK == ~0 - cmpq $__NR_syscall_max,%rax + cmpq $NR_syscalls, %rax #else andl $__SYSCALL_MASK,%eax - cmpl $__NR_syscall_max,%eax + cmpl $NR_syscalls, %eax #endif - ja badsys + jae badsys + sbb %rcx, %rcx /* array_index_mask_nospec() */ + and %rcx, %rax movq %r10,%rcx #ifdef CONFIG_RETPOLINE movq sys_call_table(, %rax, 8), %rax
@@ -577,12 +579,14 @@ tracesys: LOAD_ARGS ARGOFFSET, 1 RESTORE_REST #if __SYSCALL_MASK == ~0 - cmpq $__NR_syscall_max,%rax + cmpq $NR_syscalls, %rax #else andl $__SYSCALL_MASK,%eax - cmpl $__NR_syscall_max,%eax + cmpl $NR_syscalls, %eax #endif - ja int_ret_from_sys_call /* RAX(%rsp) set to -ENOSYS above */ + jae int_ret_from_sys_call /* RAX(%rsp) set to -ENOSYS above */ + sbb %rcx, %rcx /* array_index_mask_nospec() */ + and %rcx, %rax movq %r10,%rcx /* fixup for C */ #ifdef CONFIG_RETPOLINE movq sys_call_table(, %rax, 8), %rax