Re: [PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C
From: Nicholas Piggin <npiggin@gmail.com>
Date: 2020-03-20 03:43:31
Subsystem:
linux for powerpc (32-bit and 64-bit), the rest · Maintainers:
Madhavan Srinivasan, Michael Ellerman, Linus Torvalds
Christophe Leroy's on March 19, 2020 7:18 pm:
Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :quoted
System call entry and particularly exit code is beyond the limit of what is reasonable to implement in asm. This conversion moves all conditional branches out of the asm code, except for the case that all GPRs should be restored at exit. Null syscall test is about 5% faster after this patch, because the exit work is handled under local_irq_disable, and the hard mask and pending interrupt replay is handled after that, which avoids games with MSR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michal Suchanek <redacted> --- v2,rebase (from Michal): - Add endian conversion for dtl_idx (ms) - Fix sparse warning about missing declaration (ms) - Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe) v3: Fixes thanks to reports from mpe and selftests errors: - Several soft-mask debug and unsafe smp_processor_id() warnings due to tracing and other false positives due to checks in "unreconciled" code. - Fix a bug with syscall tracing functions that set registers (e.g., PTRACE_SETREG) not setting GPRs properly. - Fix silly tabort_syscall bug that causes kernel crashes when making system calls in transactional state. arch/powerpc/include/asm/asm-prototypes.h | 17 +- .../powerpc/include/asm/book3s/64/kup-radix.h | 14 +- arch/powerpc/include/asm/cputime.h | 29 ++ arch/powerpc/include/asm/hw_irq.h | 4 + arch/powerpc/include/asm/ptrace.h | 3 + arch/powerpc/include/asm/signal.h | 3 + arch/powerpc/include/asm/switch_to.h | 5 + arch/powerpc/include/asm/time.h | 3 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/entry_64.S | 338 +++--------------- arch/powerpc/kernel/signal.h | 2 - arch/powerpc/kernel/syscall_64.c | 213 +++++++++++ arch/powerpc/kernel/systbl.S | 9 +- 13 files changed, 328 insertions(+), 315 deletions(-) create mode 100644 arch/powerpc/kernel/syscall_64.cdiff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 983c0084fb3f..4b3609554e76 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h@@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, unsigned long __init early_init(unsigned long dt_ptr); void __init machine_init(u64 dt_ptr); #endif +#ifdef CONFIG_PPC64This ifdef is not necessary as it has no pending #else. Having function declaration without definition is not an issue. Keeping in mind that we are aiming at generalising this to PPC32.
Well there's other unnecessary ifdefs in there too I think. But sure. This patch also got the interrupt_exit_ prototypes leaked in from the later patch so I could fix those.
quoted
diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h index 2431b4ada2fa..6639a6847cc0 100644 --- a/arch/powerpc/include/asm/cputime.h +++ b/arch/powerpc/include/asm/cputime.h@@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct) #ifdef CONFIG_PPC64 #define get_accounting(tsk) (&get_paca()->accounting) static inline void arch_vtime_task_switch(struct task_struct *tsk) { }Could we have the below additions sit outside of this PPC64 ifdef, to be reused on PPC32 ?
Okay.
quoted
+ +/* + * account_cpu_user_entry/exit runs "unreconciled", so can't trace, + * can't use use get_paca() + */ +static notrace inline void account_cpu_user_entry(void) +{ + unsigned long tb = mftb(); + struct cpu_accounting_data *acct = &local_paca->accounting;In the spirit of reusing that code on PPC32, can we use get_accounting() ? Or an alternate version of get_accounting(), eg get_accounting_notrace() to be defined ?
Okay.
quoted
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.cCould some part of it go in a syscall.c to be reused on PPC32 ?
I could put it all in syscall.c and then we can adjust with some ifdefs or helpers. I don't think there is enough to be worth syscall.c, syscall_32.c, and syscall_64.c. I wonder about the interrupt returns as well, that doesn't really make sense in a file called syscall.c, but the code is very similar to system call exit. Should we just call it interrupts.c?
quoted
+ /* + * This is not required for the syscall exit path, but makes the + * stack frame look nicer. If this was initialised in the first stack + * frame, or if the unwinder was taught the first stack frame always + * returns to user with IRQS_ENABLED, this store could be avoided! + */ + regs->softe = IRQS_ENABLED;softe doesn't exist on PPC32. Can we do that through a helper ?
I guess, we can have regs_set_irq_state(regs, IRQS_ENABLED); or something like that. We make that helper and a _get_ counterpart in a later patch which covers other cases in the tree as well.
quoted
+ + __hard_irq_enable();This doesn't exist on PPC32. Should we define __hard_irq_enable() as arch_local_irq_enable() on PPC32 ?
This goes away with patch 29. Better not to have this ugly thing spill into ppc32 code at all if we can avoid it :)
quoted
+ + ti_flags = current_thread_info()->flags; + if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) { + /* + * We use the return value of do_syscall_trace_enter() as the + * syscall number. If the syscall was rejected for any reason + * do_syscall_trace_enter() returns an invalid syscall number + * and the test against NR_syscalls will fail and the return + * value to be used is in regs->gpr[3]. + */ + r0 = do_syscall_trace_enter(regs); + if (unlikely(r0 >= NR_syscalls)) + return regs->gpr[3]; + r3 = regs->gpr[3]; + r4 = regs->gpr[4]; + r5 = regs->gpr[5]; + r6 = regs->gpr[6]; + r7 = regs->gpr[7]; + r8 = regs->gpr[8]; + + } else if (unlikely(r0 >= NR_syscalls)) { + return -ENOSYS; + } + + /* May be faster to do array_index_nospec? */ + barrier_nospec(); + + if (unlikely(ti_flags & _TIF_32BIT)) {Use is_compat_task() instead ?
Michal pointed this out, he's got patches that do this on top of this series. Incremental diff for your suggestions below. Now there is likely we're going to have a few ifdefs, particularly in the exit paths where we have complexity handling irq soft masked state where helpers dont make much sense. I don't think that will be such a bad thing, but we can come to it as we go. Thanks, Nick --- arch/powerpc/include/asm/asm-prototypes.h | 4 --- arch/powerpc/include/asm/cputime.h | 38 +++++++++++++---------- 2 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 4b3609554e76..ab59a4904254 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h@@ -97,12 +97,8 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, unsigned long __init early_init(unsigned long dt_ptr); void __init machine_init(u64 dt_ptr); #endif -#ifdef CONFIG_PPC64 long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs); notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs); -notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr); -notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr); -#endif long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low, u32 len_high, u32 len_low);
diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
index 6639a6847cc0..0fccd5ea1e9a 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h@@ -43,8 +43,26 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct) */ #ifdef CONFIG_PPC64 #define get_accounting(tsk) (&get_paca()->accounting) +#define raw_get_accounting(tsk) (&local_paca->accounting) static inline void arch_vtime_task_switch(struct task_struct *tsk) { } +#else +#define get_accounting(tsk) (&task_thread_info(tsk)->accounting) +#define raw_get_accounting(tsk) get_accounting(tsk) +/* + * Called from the context switch with interrupts disabled, to charge all + * accumulated times to the current process, and to prepare accounting on + * the next process. + */ +static inline void arch_vtime_task_switch(struct task_struct *prev) +{ + struct cpu_accounting_data *acct = get_accounting(current); + struct cpu_accounting_data *acct0 = get_accounting(prev); + + acct->starttime = acct0->starttime; +} +#endif + /* * account_cpu_user_entry/exit runs "unreconciled", so can't trace, * can't use use get_paca()
@@ -52,35 +70,21 @@ static inline void arch_vtime_task_switch(struct task_struct *tsk) { } static notrace inline void account_cpu_user_entry(void) { unsigned long tb = mftb(); - struct cpu_accounting_data *acct = &local_paca->accounting; + struct cpu_accounting_data *acct = raw_get_accounting(current); acct->utime += (tb - acct->starttime_user); acct->starttime = tb; } + static notrace inline void account_cpu_user_exit(void) { unsigned long tb = mftb(); - struct cpu_accounting_data *acct = &local_paca->accounting; + struct cpu_accounting_data *acct = raw_get_accounting(current); acct->stime += (tb - acct->starttime); acct->starttime_user = tb; } -#else -#define get_accounting(tsk) (&task_thread_info(tsk)->accounting) -/* - * Called from the context switch with interrupts disabled, to charge all - * accumulated times to the current process, and to prepare accounting on - * the next process. - */ -static inline void arch_vtime_task_switch(struct task_struct *prev) -{ - struct cpu_accounting_data *acct = get_accounting(current); - struct cpu_accounting_data *acct0 = get_accounting(prev); - - acct->starttime = acct0->starttime; -} -#endif #endif /* __KERNEL__ */ #else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
--
2.23.0