Thread (29 messages) 29 messages, 5 authors, 2024-11-24

RE: [PATCH v3 2/6] x86/uaccess: Avoid barrier_nospec() in 64-bit __get_user()

From: David Laight <hidden>
Date: 2024-11-22 09:39:02
Also in: lkml

From: Linus Torvalds
Sent: 21 November 2024 22:16

On Thu, 21 Nov 2024 at 13:40, Josh Poimboeuf [off-list ref] wrote:
quoted
The profile is showing futex_get_value_locked():
Ahh.
quoted
That has several callers, so we can probably just use get_user() there?
Yeah, that's the simplest thing. That thing isn't even some inline
function, so the real cost is the call.

That said, exactly because it's not inlined, and calls are expensive,
and this is apparently really critical, we can just do it with the
full "unsafe_get_user()" model.

It's not so complicated. The attached patch is untested, but I did
check that it generates almost perfect code:

    mov    %gs:0x0,%rax                 # current
    incl   0x1a9c(%rax)                 # current->pagefault_disable++
    movabs $0x123456789abcdef,%rcx      # magic virtual address size
    cmp    %rsi,%rcx                    # address masking
    sbb    %rcx,%rcx
    or     %rsi,%rcx
    stac                                # enable user space acccess
    mov    (%rcx),%ecx                  # get the value
    clac                                # disable user space access
    decl   0x1a9c(%rax)                 # current->pagefault_disable--
    mov    %ecx,(%rdi)                  # save the value
    xor    %eax,%eax                    # return 0
    ret

(with the error case for the page fault all out-of-line).
I presume you are assuming an earlier access_ok() call? 

IIRC all x86 from 286 onwards fault accesses that cross the ~0 to 0 boundary.
So you don't need to have called access_ok() prior to the above
for non-byte accesses.
Even for byte accesses (and with ~0 being a valid address) do the
address mask before pagefault_disable++ and the error test is 'jc label'
after the 'cmp'.

Don't you also get better code for an 'asm output with goto' form?
While it requires the caller have a 'label: return -EFAULT;' somewhere
that is quite common in kernel code.
So this should be _faster_ than the old __get_user(), because while
the address masking is not needed, it's cheaper than the function call
used to be and the error handling is better.
Perhaps get_user() should be the function call and __get_user() inlined.
Both including address validation but different calling conventions?
(After fixing the code that doesn't want address masking.)

...
Now, we could possibly say "just remove the fence in __get_user()
entirely", but that would involve moving it to access_ok().
How valid would it be to assume an explicit access_ok() was far
enough away from the get_user() that you couldn't speculate all the
way to something that used the read value to perform another
kernel access?
And then it wouldn't actually speed anything up (except the horrid
architecture-specific kernel uses that then don't call access_ok() at
all - and we don't care about *those*).
Find and kill :-)

	David
               Linus
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help