Re: [PATCHv7 02/14] mm: Add support for unaccepted memory
From: Tom Lendacky <thomas.lendacky@amd.com>
Date: 2022-09-26 15:08:03
Also in:
linux-efi, linux-mm, lkml
On 9/26/22 07:10, Kirill A. Shutemov wrote:
On Sat, Sep 24, 2022 at 04:03:02AM +0300, Kirill A. Shutemov wrote:quoted
On Thu, Sep 22, 2022 at 09:31:12AM -0500, Tom Lendacky wrote:quoted
On 9/8/22 14:28, Mike Rapoport wrote:quoted
On Thu, Sep 08, 2022 at 09:23:07AM -0700, Dionna Amalie Glaze wrote:quoted
quoted
Looks like the first access to the memory map fails, although I think it's not in INIT_LIST_HEAD() but rather in init_page_count(). I'd start with making sure that page_alloc::memmap_alloc() actually returns accepted memory. If you build kernel with CONFIG_DEBUG_VM=y the memory map will poisoned in this function, so my guess is it'd crash there.That's a wonderful hint, thank you! I did not run this test CONFIG_DEBUG_VM set, but you think it's possible it could still be here?It depends on how you configured your kernel. Say, defconfig does not set it.I also hit the issue at 256GB. My config is using CONFIG_SPARSEMEM_VMEMMAP and fails in memmap_init_range() when attempting to add the first PFN. It looks like the underlying page that is backing the vmemmap has not been accepted (I receive a #VC 0x404 => page not validated). Kirill, is this a path that you've looked at? It would appear that somewhere in the vmemmap_populate_hugepages() path, some memory acceptance needs to be done for the pages that are used to back vmemmap. I'm not very familiar with this code, so I'm not sure why everything works for a guest with 255GB of memory, but then fails for a guest with 256GB of memory.Hm. I don't have machine that large at hands at the moment. And I have not looked at the codepath before. I will try to look into the issue.I'm not able to trigger the bug. With help of vm.overcommit_memory=1, I was managed boot TDX guest to shell with 256G and 1T of guest memory just fine. Any chance it is SEV-SNP specific?
There's always a chance. I'll do some more tracing and see what I can find to try and be certain.
Or maybe there some difference in kernel config? Could you share yours?
Yes, I'll send that to you off-list. Thanks, Tom