Re: [PATCH] powerpc/rtas: Fix hang in race against concurrent cpu offline

From: Nathan Lynch <hidden>
Date: 2019-06-26 23:53:14

Hi Juliet,

Juliet Kim [off-list ref] writes:

On 6/25/19 12:29 PM, Nathan Lynch wrote:

quoted

Juliet Kim [off-list ref] writes:

quoted

However, that fix failed to notify Hypervisor that the LPM attempted
had been abandoned which results in a system hang.

quoted

It is surprising to me that leaving a migration unterminated would cause
Linux to hang. Can you explain more about how that happens?

PHYP will block further requests(next partition migration, dlpar etc) while
it's in suspending state. That would have a follow-on effect on the HMC and
potentially this and other partitions.

I can believe that operations on _this LPAR_ would be blocked by the
platform and/or management console while the migration remains
unterminated, but the OS should not be able to perpetrate a denial of
service on other partitions or the management console simply by botching
the LPM protocol. If it can, that's not Linux's bug to fix.

quoted

Fix this by sending a signal PHYP to cancel the migration, so that PHYP
can stop waiting, and clean up the migration.

This is well-spotted and rtas_ibm_suspend_me() needs to signal
cancellation in several error paths. But I don't agree that this is one
of them: this race is going to be a temporary condition in any
production setting, and retrying would allow the migration to
succeed.

If LPM and CPU offine requests conflict with one another, it might be better
to let them fail and let the customer decide which he prefers.

Hmm I don't think so. When (if ever) this happens in production it would
be the result of an unlucky race with a power management daemon or
similar, not a conscious decision of the administrator in the moment.

IBM i cancels migration if the other OS components/operations veto
migration. It’s consistent with other OS behavior for LPM.

But this situation isn't really like that. If we were to have a real
veto mechanism, it would only make sense to have it run as early as
possible, before the platform has done a bunch of work. This benign,
recoverable race is occurring right before we complete the migration,
which at this point has been copying state to the destination for
minutes or hours. It doesn't make sense to error out like this.

As I mentioned earlier though, it does make sense to signal a
cancellation for these less-recoverable error conditions in
rtas_ibm_suspend_me():

- rtas_online_cpus_mask() failure
- alloc_cpumask_var() failure
- the atomic_read(&data.error) != 0 case after returning from the IPI

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help