Re: KVM guests freeze under upstream kernel
From: <hidden>
Date: 2017-07-26 13:19:09
On Thu, Jul 20, 2017 at 10:18:18PM -0300, joserz@linux.vnet.ibm.com wrote:
quoted hunk ↗ jump to hunk
On Thu, Jul 20, 2017 at 03:21:59PM +1000, Paul Mackerras wrote:quoted
On Thu, Jul 20, 2017 at 12:02:23AM -0300, joserz@linux.vnet.ibm.com wrote:quoted
On Thu, Jul 20, 2017 at 09:42:50AM +1000, Benjamin Herrenschmidt wrote:quoted
On Wed, 2017-07-19 at 16:46 -0300, joserz@linux.vnet.ibm.com wrote:quoted
Hello! We're not able to boot any KVM guest using upstream kernel (cb8c65ccff7f77d0285f1b126c72d37b2572c865 - 4.13.0-rc1+). After reaching the SLOF initial counting, the guest simply freezes:Can you send our .config ?Sure, Answering Michael as well: It's a P9 with RHEL kernel 4.11.0-10.el7a.ppc64le installed. The problem was noticed with kernel > 4.13 (I'm currently running 4.13.0-rc1+). QEMU is https://github.com/dgibson/qemu (ppc-for-2.10) but I gave the default packaged Qemu a try. For the guest, I tried both a vanilla Ubuntu 17.04 and the host kernel. But they had never a chance to run since the freezing happened in SLOF. Note that using the 4.11.0-10.el7a.ppc64le kernel it works fine (for any of these Qemu/Guest setup). With 4.13.0-rc1 I have it run after reverting that referred commit.Is the host kernel running in radix mode?yesquoted
Did you check the host kernel logs for any oops messages?dmesg was clean but after sometime waiting (I forgot QEMU running in another terminal) I got the oops below (after rebooting the host I couldn't reproduce it again). Another test that I did was: Compile with transparent huge pages disabled: KVM works fine Compile with transparent huge pages enabled: doesn't work + disabling it in /sys/kernel/mm/transparent_hugepage: doesn't work Just out of my own curiosity I made this small change:diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.hb/arch/powerpc/include index c0737c8..f94a3b6 100644--- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h@@ -80,7 +80,7 @@ #define _PAGE_SOFT_DIRTY _RPAGE_SW3 /* software: software dirty tracking #define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */ -#define _PAGE_DEVMAP _RPAGE_SW1 /* software: ZONE_DEVICE page */ +#define _PAGE_DEVMAP _RPAGE_RSV3 #define __HAVE_ARCH_PTE_DEVMAPand it works. I chose _RPAGE_RSV3 because it uses the same value that x86 uses (0x0400000000000000UL) but I don't if it could have any side effect
Does this change make any sense to you people? I didn't see any side effect expect that devices backed memory will have a bigger address space in transparent huge pages IF I understand that correctly. If so I can send a patch with this change. Thank you!!