Re: powerpc Linux scv support and scv system call ABI proposal
From: Adhemerval Zanella <hidden>
Date: 2020-01-28 17:28:28
On 28/01/2020 11:05, Nicholas Piggin wrote:
Florian Weimer's on January 28, 2020 11:09 pm:quoted
* Nicholas Piggin:quoted
* Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other vectors will return -ENOSYS, and the decision for how to add support for a new vector deferred until we see the next user.Seems reasonable. We don't have to decide this today.quoted
* Proposal is for scv 0 to provide the standard Linux system call ABI with some differences: - LR is volatile across scv calls. This is necessary for support because the scv instruction clobbers LR.I think we can express this in the glibc system call assembler wrapper generators. The mcount profiling wrappers already have this property. But I don't think we are so lucky for the inline system calls. GCC recognizes an "lr" clobber with inline asm (even though it is not documented), but it generates rather strange assembler output as a result: long f (long x) { long y; asm ("#" : "=r" (y) : "r" (x) : "lr"); return y; } .abiversion 2 .section ".text" .align 2 .p2align 4,,15 .globl f .type f, @function f: .LFB0: .cfi_startproc mflr 0 .cfi_register 65, 0 #APP # 5 "t.c" 1 # # 0 "" 2 #NO_APP std 0,16(1) .cfi_offset 65, 16 ori 2,2,0 ld 0,16(1) mtlr 0 .cfi_restore 65 blr .long 0 .byte 0,0,0,1,0,0,0,0 .cfi_endproc .LFE0: .size f,.-f That's with GCC 8.3 at -O2. I don't understand what the ori is about.ori 2,2,0 is the group terminating nop hint for POWER8 type cores which had dispatch grouping rules.
It worth to note that it aims to mitigate a load-hit-store cpu stall on some powerpc chips.
quoted
I don't think we can save LR in a regular register around the system call, explicitly in the inline asm statement, because we still have to generate proper unwinding information using CFI directives, something that you cannot do from within the asm statement. Supporting this in GCC should not be impossible, but someone who actually knows this stuff needs to look at it.The generated assembler actually seems okay to me. If we compile something like a syscall and with -mcpu=power9: long f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0) { register long r0 asm ("r0") = _r0; register long r3 asm ("r3") = _r3; register long r4 asm ("r4") = _r4; register long r5 asm ("r5") = _r5; register long r6 asm ("r6") = _r6; register long r7 asm ("r7") = _r7; register long r8 asm ("r8") = _r8; asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), "r"(r8) : "lr", "ctr", "cc", "xer"); return r3; } f: .LFB0: .cfi_startproc mflr 0 std 0,16(1) .cfi_offset 65, 16 mr 0,9 #APP # 12 "a.c" 1 # scv # 0 "" 2 #NO_APP ld 0,16(1) mtlr 0 .cfi_restore 65 blr .long 0 .byte 0,0,0,1,0,0,0,0 .cfi_endproc That gets the LR save/restore right when we're also using r0.quoted
quoted
- CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the system call exit to avoid restoring the CR register.This sounds reasonable, but I don't know what kind of knock-on effects this has. The inline system call wrappers can handle this with minor tweaks.Okay, good. In the end we would have to check code trace through the kernel and libc of course, but I think there's little to no opportunity to take advantage of current extra non-volatile cr regs. mtcr has to write 8 independently renamed registers so it's cracked into 2 insns on POWER9 (and likely to always be a bit troublesome). It's not much in the scheme of a system call, but while we can tweak the ABI...
We don't really need a mfcr/mfocr to implement the Linux syscall ABI on
powerpc, we can use a 'bns+' plus a neg instead as:
--
#define internal_syscall6(name, err, nr, arg1, arg2, arg3, arg4, arg5, \
arg6) \
({ \
register long int r0 __asm__ ("r0") = (long int) (name); \
register long int r3 __asm__ ("r3") = (long int) (arg1); \
register long int r4 __asm__ ("r4") = (long int) (arg2); \
register long int r5 __asm__ ("r5") = (long int) (arg3); \
register long int r6 __asm__ ("r6") = (long int) (arg4); \
register long int r7 __asm__ ("r7") = (long int) (arg5); \
register long int r8 __asm__ ("r8") = (long int) (arg6); \
__asm__ __volatile__ \
("sc\n\t" \
"bns+ 1f\n\t" \
"neg %1, %1\n\t" \
"1:\n\t" \
: "+r" (r0), "+r" (r3), "+r" (r4), "+r" (r5), "+r" (r6), \
"+r" (r7), "+r" (r8) \
: \
: "r9", "r10", "r11", "r12", \
"cr0", "memory"); \
r3; \
})
--
And change INTERNAL_SYSCALL_ERROR_P to check for the expected invalid
range (((unsigned long) (val) >= (unsigned long) -4095)) and
INTERNAL_SYSCALL_ERRNO to return a negative value (since the value will
be negated by INTERNAL_SYSCALL_ERROR_P).
The powerpc kernel ABI to use a different constraint to signal error
also requires glibc to reimplement the vDSO symbol call to be arch
specific instead a straight function call (since it might fallbacks
to a syscall).
Even for POWER-specific system call that uses all result bits, either
it should not fail or it would require a arch-specific implementation
to setup the expected error value (since the information would require
another source or a pre-defined value).
In fact I think we make the assumption that INTERNAL_SYSCALL returns
a negative errno value in case or an error and make all the handling
to check for a syscall failure and errno setting generic. This will
required change ia64, mips, nios2, and sparc though.
quoted
quoted
- Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr instruction on the kernel side, and it is currently not implemented well in glibc, requiring a mfcr (mfocr should be possible and asm goto support would allow a better implementation). Is it worth continuing this style of error handling? Or just move to -ve return means error? Using a different bit would allow the kernel to piggy back the CR return code setting with a test for the error case exit.GCC does not model the condition registers, so for inline system calls, we have to produce a value anyway that the subsequence C code can check. The assembler syscall wrappers do not need to do this, of course, but I'm not sure which category of interfaces is more important.Right. asm goto can improve this kind of pattern if it's inlined into the C code which tests the result, it can branch using the flags to the C error handling label, rather than move flags into GPR, test GPR, branch. However...quoted
But the kernel uses the -errno convention internally, so I think it would make sense to pass this to userspace and not convert back and forth. This would align with what most of the architectures do, and also avoids the GCC oddity.Yes I would be interested in opinions for this option. It seems like matching other architectures is a good idea. Maybe there are some reasons not to.quoted
quoted
- Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit calls if there was interest in developing an ABI for 32-bit programs. Marginal benefit in avoiding compat syscall selection.We don't have an ELFv2 ABI for 32-bit. I doubt it makes sense to provide an ELFv1 port for this given that it's POWER9-specific.Okay. There's no reason not to enable this for BE, at least for the kernel it's no additional work so it probably remains enabled (unless there is something really good we could do with the ABI if we exclude ELFv1 but I don't see anything). But if glibc only builds for ELFv2 support that's probably reasonable.quoted
From the glibc perspective, the major question is how we handle run-time selection of the system call instruction sequence. On i386, we use a function pointer in the TCB to call an instruction sequence in the vDSO. That's problematic from a security perspective. I expect that on POWER9, using a pointer in read-only memory would be equally non-attractive due to a similar lack of PC-relative addressing. We could use the HWCAP bit in the TCB, but that would add another (easy to predict) conditional branch to every system call.I would have to defer to glibc devs on this. Conditional branch should be acceptable I think, scv improves speed as much as several mispredicted branches (about 90 cycles).quoted
I don't think it matters whether both system call variants use the same error convention because we could have different error code extraction code on the two branches.That's one less difficulty.
We already had to push a similar hack where glibc used to abort transactions prior syscalls to avoid some side-effects on kernel (commit 56cf2763819d2f). It was eventually removed from syscall handling by f0458cf4f9ff3d870, where we only enable TLE if kernel suppors PPC_FEATURE2_HTM_NOSC. The transaction syscall abort used to read a variable directly from TCB, so this could be an option. I would expect that we could optimize it where if glibc is building against a recent kernel and compiler is building for a ISA 3.0+ cpu we could remove the 'sc' code.