Thread (10 messages) 10 messages, 6 authors, 2017-07-27

Re: KVM guests freeze under upstream kernel

From: <hidden>
Date: 2017-07-26 13:19:09

On Thu, Jul 20, 2017 at 10:18:18PM -0300, joserz@linux.vnet.ibm.com wrote:
quoted hunk ↗ jump to hunk
On Thu, Jul 20, 2017 at 03:21:59PM +1000, Paul Mackerras wrote:
quoted
On Thu, Jul 20, 2017 at 12:02:23AM -0300, joserz@linux.vnet.ibm.com wrote:
quoted
On Thu, Jul 20, 2017 at 09:42:50AM +1000, Benjamin Herrenschmidt wrote:
quoted
On Wed, 2017-07-19 at 16:46 -0300, joserz@linux.vnet.ibm.com wrote:
quoted
Hello!

We're not able to boot any KVM guest using upstream kernel (cb8c65ccff7f77d0285f1b126c72d37b2572c865 - 4.13.0-rc1+).
After reaching the SLOF initial counting, the guest simply freezes:
Can you send our .config ?
Sure,

Answering Michael as well:

It's a P9 with RHEL kernel 4.11.0-10.el7a.ppc64le installed. The problem
was noticed with kernel > 4.13 (I'm currently running 4.13.0-rc1+).

QEMU is https://github.com/dgibson/qemu (ppc-for-2.10) but I gave the
default packaged Qemu a try.

For the guest, I tried both a vanilla Ubuntu 17.04 and the host kernel.
But they had never a chance to run since the freezing happened in SLOF.

Note that using the 4.11.0-10.el7a.ppc64le kernel it works fine
(for any of these Qemu/Guest setup). With 4.13.0-rc1 I have it run after
reverting that referred commit.
Is the host kernel running in radix mode?
yes
quoted
Did you check the host kernel logs for any oops messages?
dmesg was clean but after sometime waiting (I forgot QEMU running in
another terminal) I got the oops below (after rebooting the host I 
couldn't reproduce it again).

Another test that I did was:
Compile with transparent huge pages disabled: KVM works fine
Compile with transparent huge pages enabled: doesn't work
  + disabling it in /sys/kernel/mm/transparent_hugepage: doesn't work

Just out of my own curiosity I made this small change:
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
b/arch/powerpc/include
index c0737c8..f94a3b6 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -80,7 +80,7 @@
 
  #define _PAGE_SOFT_DIRTY       _RPAGE_SW3 /* software: software dirty
  tracking 
   #define _PAGE_SPECIAL          _RPAGE_SW2 /* software: special page */
   -#define _PAGE_DEVMAP           _RPAGE_SW1 /* software: ZONE_DEVICE page */
   +#define _PAGE_DEVMAP           _RPAGE_RSV3
    #define __HAVE_ARCH_PTE_DEVMAP
and it works. I chose _RPAGE_RSV3 because it uses the same value that
x86 uses (0x0400000000000000UL) but I don't if it could have any side
effect
Does this change make any sense to you people?
I didn't see any side effect expect that devices backed memory will have
a bigger address space in transparent huge pages IF I understand that
correctly.

If so I can send a patch with this change.

Thank you!!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help