Thread (24 messages) 24 messages, 8 authors, 2018-01-18

Re: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution

From: Dan Williams <hidden>
Date: 2018-01-12 01:41:14
Also in: linux-arch, linux-media, linux-scsi, linux-wireless, lkml

On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
[off-list ref] wrote:
On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams [off-list ref] wrote:
quoted
This series incorporates Mark Rutland's latest ARM changes and adds
the x86 specific implementation of 'ifence_array_ptr'. That ifence
based approach is provided as an opt-in fallback, but the default
mitigation, '__array_ptr', uses a 'mask' approach that removes
conditional branches instructions, and otherwise aims to redirect
speculation to use a NULL pointer rather than a user controlled value.
Do you have any performance numbers and perhaps example code
generation? Is this noticeable? Are there any microbenchmarks showing
the difference between lfence use and the masking model?
I don't have performance numbers, but here's a sample code generation
from __fcheck_files, where the 'and; lea; and' sequence is portion of
array_ptr() after the mask generation with 'sbb'.

        fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
     8e7:       8b 02                   mov    (%rdx),%eax
     8e9:       48 39 c7                cmp    %rax,%rdi
     8ec:       48 19 c9                sbb    %rcx,%rcx
     8ef:       48 8b 42 08             mov    0x8(%rdx),%rax
     8f3:       48 89 fe                mov    %rdi,%rsi
     8f6:       48 21 ce                and    %rcx,%rsi
     8f9:       48 8d 04 f0             lea    (%rax,%rsi,8),%rax
     8fd:       48 21 c8                and    %rcx,%rax

Having both seems good for testing, but wouldn't we want to pick one in the end?
I was thinking we'd keep it as a 'just in case' sort of thing, at
least until the 'probably safe' assumption of the 'mask' approach has
more time to settle out.
Also, I do think that there is one particular array load that would
seem to be pretty obvious: the system call function pointer array.

Yes, yes, the actual call is now behind a retpoline, but that protects
against a speculative BTB access, it's not obvious that it  protects
against the mispredict of the __NR_syscall_max comparison in
arch/x86/entry/entry_64.S.

The act of fetching code is a kind of read too. And retpoline protects
against BTB stuffing etc, but what if the _actual_ system call
function address is wrong (due to mis-prediction of the system call
index check)?

Should the array access in entry_SYSCALL_64_fastpath be made to use
the masking approach?
I'll take a look. I'm firmly in the 'patch first / worry later' stance
on these investigations.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help