Re: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution
From: Dan Williams <hidden>
Date: 2018-01-12 01:41:14
Also in:
linux-arch, linux-media, linux-scsi, linux-wireless, lkml
On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds [off-list ref] wrote:
On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams [off-list ref] wrote:quoted
This series incorporates Mark Rutland's latest ARM changes and adds the x86 specific implementation of 'ifence_array_ptr'. That ifence based approach is provided as an opt-in fallback, but the default mitigation, '__array_ptr', uses a 'mask' approach that removes conditional branches instructions, and otherwise aims to redirect speculation to use a NULL pointer rather than a user controlled value.Do you have any performance numbers and perhaps example code generation? Is this noticeable? Are there any microbenchmarks showing the difference between lfence use and the masking model?
I don't have performance numbers, but here's a sample code generation
from __fcheck_files, where the 'and; lea; and' sequence is portion of
array_ptr() after the mask generation with 'sbb'.
fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
8e7: 8b 02 mov (%rdx),%eax
8e9: 48 39 c7 cmp %rax,%rdi
8ec: 48 19 c9 sbb %rcx,%rcx
8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
8f3: 48 89 fe mov %rdi,%rsi
8f6: 48 21 ce and %rcx,%rsi
8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
8fd: 48 21 c8 and %rcx,%rax
Having both seems good for testing, but wouldn't we want to pick one in the end?
I was thinking we'd keep it as a 'just in case' sort of thing, at least until the 'probably safe' assumption of the 'mask' approach has more time to settle out.
Also, I do think that there is one particular array load that would seem to be pretty obvious: the system call function pointer array. Yes, yes, the actual call is now behind a retpoline, but that protects against a speculative BTB access, it's not obvious that it protects against the mispredict of the __NR_syscall_max comparison in arch/x86/entry/entry_64.S. The act of fetching code is a kind of read too. And retpoline protects against BTB stuffing etc, but what if the _actual_ system call function address is wrong (due to mis-prediction of the system call index check)? Should the array access in entry_SYSCALL_64_fastpath be made to use the masking approach?
I'll take a look. I'm firmly in the 'patch first / worry later' stance on these investigations.