Thread (11 messages) 11 messages, 5 authors, 2019-07-03

Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration

From: Juliet Kim <hidden>
Date: 2019-06-26 21:42:01

On 6/25/19 1:51 PM, Nathan Lynch wrote:
Juliet Kim [off-list ref] writes:
quoted
There's some concern this could retry forever, resulting in live lock.
First of all the system will make progress in other areas even if there
are repeated retries; we're not indefinitely holding locks or anything
like that.
For instance, system admin runs a script that picks and offlines CPUs in a
loop to keep a certain rate of onlined CPUs for energy saving. If LPM keeps
putting CPUs back online, that would never finish, and would keepgenerating
new offline requests
Second, Linux checks the H_VASI_STATE result on every retry. If the
platform wants to terminate the migration (say, if it imposes a
timeout), Linux will abandon it when H_VASI_STATE fails to return
H_VASI_SUSPENDING. And it seems incorrect to bail out before that
happens, absent hard errors on the Linux side such as allocation
failures.
I confirmed with the PHYP and HMC folks that they wouldn't time out the LPM
request including H_VASI_STATE, so if the LPM retries were unlucky enough to
encounter repeated CPU offline attempts (maybe some customer code retrying
that), then the retries could continue indefinitely, or until some manual
intervention.  And in the mean time, the LPM delay here would cause PHYP to
block other operations.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help