Thread (8 messages) 8 messages, 3 authors, 2019-07-18

RE: [PATCH] x86/hyper-v: Zero out the VP assist page to fix CPU offlining

From: Thomas Gleixner <hidden>
Date: 2019-07-18 07:01:06
Also in: lkml

On Thu, 18 Jul 2019, Dexuan Cui wrote:
quoted hunk ↗ jump to hunk
quoted
On Thu, 4 Jul 2019, Dexuan Cui wrote:
This is the allocation when the CPU is brought online for the first
time. So what effect has zeroing at allocation time vs. offlining and
potentially receiving IPIs? That allocation is never freed.

Neither the comment nor the changelog make any sense to me.
	tglx
That allocation was introduced by the commit
a46d15cc1ae5 ("x86/hyper-v: allocate and use Virtual Processor Assist Pages").

I think it's ok to not free the page when a CPU is offlined: every
CPU uses only 1 page and CPU offlining is not really a very usual
operation except for the scenario of hibernation (and suspend-to-memory), 
where the CPUs are quickly onlined again, when we resume from hibernation.
IMO Vitaly intentionally decided to not free the page for simplicity of the
code.

When a CPU (e.g. CPU1) is being onlined, in hv_cpu_init(), we allocate the
VP_ASSIST_PAGE page and enable the PV EOI optimization for this CPU by
writing the MSR HV_X64_MSR_VP_ASSIST_PAGE. From now on, this CPU
*always* uses hvp->apic_assist (which is updated by the hypervisor) to
decide if it needs to write the EOI MSR:

static void hv_apic_eoi_write(u32 reg, u32 val)
{
        struct hv_vp_assist_page *hvp = hv_vp_assist_page[smp_processor_id()];

        if (hvp && (xchg(&hvp->apic_assist, 0) & 0x1))
                return;

        wrmsr(HV_X64_MSR_EOI, val, 0);
}

When a CPU (e.g. CPU1) is being offlined, on this CPU, we do:
1. in hv_cpu_die(), we disable the PV EOI optimizaton for this CPU;
2. we finish the remaining work of stopping this CPU;
3. this CPU is completed stopped.

Between 1 and 3, this CPU can still receive interrupts (e.g. IPIs from CPU0,
and Local APIC timer interrupts), and this CPU *must* write the EOI MSR for
every interrupt received, otherwise the hypervisor may not deliver further
interrupts, which may be needed to stop this CPU completely.

So we need to make sure hvp->apic_assist.bit0 is zero, after we run the line
"wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);" in hv_cpu_die(). The easiest
way is what I do in this patch. Alternatively, we can use the below patch:
@@ -188,8 +188,12 @@ static int hv_cpu_die(unsigned int cpu)
        local_irq_restore(flags);
        free_page((unsigned long)input_pg);

-       if (hv_vp_assist_page && hv_vp_assist_page[cpu])
+       if (hv_vp_assist_page && hv_vp_assist_page[cpu]) {
+               local_irq_save(flags);
                wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
+               hvp->apic_assist &= ~1;
+               local_irq_restore(flags);
+       }

        if (hv_reenlightenment_cb == NULL)
                return 0;
This second version needs 3+ lines, so I prefer the one-line version. :-)
Those are two different things. The GPF_ZERO allocation makes sense on it's
own but it _cannot_ prevent the following scenario:

    cpu_init()
      if (!hvp)
      	 hvp = vmalloc(...., GFP_ZERO);
    ...

    hvp->apic_assist |= 1;

#1   cpu_die()
      if (....)
           wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);

   ---> IPI
   	if (!(hvp->apic_assist & 1))	
	   wrmsr(APIC_EOI);    <- PATH not taken

#3   cpu is dead

    cpu_init()
       if (!hvp)
          hvp = vmalloc(....m, GFP_ZERO);  <- NOT TAKEN because hvp != NULL

So you have to come up with a better fairy tale why GFP_ZERO 'fixes' this.

Allocating hvp with GFP_ZERO makes sense on it's own so the allocated
memory has a defined state, but that's a different story.

The 3 liner patch above makes way more sense and you can spare the
local_irq_save/restore by moving the whole condition into the
irq_save/restore region above.

Thanks,

	tglx
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help