Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
From: Nathan Lynch <hidden>
Date: 2023-03-23 13:41:14
Michael Ellerman [off-list ref] writes:
Nathan Lynch via B4 Relay [off-list ref] writes:quoted
From: Nathan Lynch <redacted> The kernel can handle retrying RTAS function calls in response to -2/990x in the sys_rtas() handler instead of relaying the intermediate status to user space.This looks good in general. One query ...quoted
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 47a2aa43d7d4..c330a22ccc70 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c@@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs, /* We assume to be passed big endian arguments */ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) { - struct pin_cookie cookie; struct rtas_args args; unsigned long flags; char *buff_copy, *errbuf = NULL;@@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) buff_copy = get_errorlog_buffer(); - raw_spin_lock_irqsave(&rtas_lock, flags); - cookie = lockdep_pin_lock(&rtas_lock); + do { + struct pin_cookie cookie; - rtas_args = args; - do_enter_rtas(&rtas_args); - args = rtas_args; + raw_spin_lock_irqsave(&rtas_lock, flags); + cookie = lockdep_pin_lock(&rtas_lock); - /* A -1 return code indicates that the last command couldn't - be completed due to a hardware error. */ - if (be32_to_cpu(args.rets[0]) == -1) - errbuf = __fetch_rtas_last_error(buff_copy); + rtas_args = args; + do_enter_rtas(&rtas_args); + args = rtas_args; - lockdep_unpin_lock(&rtas_lock, cookie); - raw_spin_unlock_irqrestore(&rtas_lock, flags); + /* + * Handle error record retrieval before releasing the lock. + */ + if (be32_to_cpu(args.rets[0]) == -1) + errbuf = __fetch_rtas_last_error(buff_copy); + + lockdep_unpin_lock(&rtas_lock, cookie); + raw_spin_unlock_irqrestore(&rtas_lock, flags); + } while (rtas_busy_delay(be32_to_cpu(args.rets[0])));rtas_busy_delay_early() has the successive_ext_delays case that will break out eventually. But if we keep getting plain RTAS_BUSY back from RTAS I *think* this loop will never terminate?
Yes, but if this happens, then there is a serious bug in Linux or RTAS. The only time I've seen something like that on PowerVM is when Linux corrupted internal RTAS state by not serializing calls correctly. rtas_busy_delay_early() has a bail-out heuristic, not for RTAS_BUSY, but for extended delay statuses (990x), which I suspect happen rarely (if ever) that early. That's there in order to allow boot to proceed and hopefully get useful messages out in a truly unexpected circumstance. That said...
To avoid that, and just as good manners, I think we should have a fatal_signal_pending() check, and if that returns true we bail out of the syscall with -EINTR ?
That probably makes sense. In its current state, I could see this patch preventing or delaying OS shutdown in situations where it wouldn't have occurred before. I think I would want the bailout condition in this case to be (fatal_signal_pending() && retries > some_threshold), to reduce the likelihood of non-"stuck" operations from being left unfinished. And it should dump a stack trace.