Re: [PATCH Part2 RFC v4 10/40] x86/fault: Add support to handle the RMP fault for user address
From: Brijesh Singh <hidden>
Date: 2021-07-12 15:43:59
Also in:
kvm, linux-coco, linux-efi, linux-mm, lkml, platform-driver-x86
Hi Dave, On 7/8/21 11:16 AM, Dave Hansen wrote:
"SIGBUG"?
Its typo, it should be SIGBUS
quoted
+ + if (unlikely(!cpu_feature_enabled(X86_FEATURE_SEV_SNP))) + return RMP_FAULT_KILL;Shouldn't this be a WARN_ON_ONCE()? How can we get RMP faults without SEV-SNP?
Yes, we should *not* get RMP fault if SEV-SNP is not enabled. I can use the WARN_ON_ONCE().
quoted
+ /* Get the native page level */ + pte = lookup_address_in_mm(current->mm, address, &level); + if (unlikely(!pte)) + return RMP_FAULT_KILL;What would this mean? There was an RMP fault on a non-present page? How could that happen? What if there was a race between an unmapping event and the RMP fault delivery?
We should not have RMP fault for non-present pages. But you have a good point that there maybe a race between the unmap event and RMP fault. Instead of terminating the process we should simply retry.
quoted
+ pfn = pte_pfn(*pte); + if (level > PG_LEVEL_4K) { + mask = pages_per_hpage(level) - pages_per_hpage(level - 1); + pfn |= (address >> PAGE_SHIFT) & mask; + }This looks inherently racy. What happens if there are two parallel RMP faults on the same 2M page. One of them splits the page tables, the other gets a fault for an already-split page table. > Is that handled here somehow?
Yes, in this particular case we simply retry and hardware should re-evaluate the page level and take the corrective action.
quoted
+ /* Get the page level from the RMP entry. */ + e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level); + if (!e) + return RMP_FAULT_KILL;The snp_lookup_page_in_rmptable() failure cases looks WARN-worthly. Either you're doing a lookup for something not *IN* the RMP table, or you don't support SEV-SNP, in which case you shouldn't be in this code in the first place.
Noted.
quoted
+ /* + * Check if the RMP violation is due to the guest private page access. + * We can not resolve this RMP fault, ask to kill the guest. + */ + if (rmpentry_assigned(e)) + return RMP_FAULT_KILL;No "We's", please. Speak in imperative voice.
Noted.
quoted
+ /* + * The backing page level is higher than the RMP page level, request + * to split the page. + */ + if (level > rmp_level) + return RMP_FAULT_PAGE_SPLIT;This can theoretically trigger on a hugetlbfs page. Right?
Yes, theoretically. In the current implementation, the VMM is enlightened to not use the hugetlbfs for backing page when creating the SEV-SNP guests.
I thought I asked about this before... more below...quoted
+ return RMP_FAULT_RETRY; +} + /* * Handle faults in the user portion of the address space. Nothing in here * should check X86_PF_USER without a specific justification: for almost@@ -1298,6 +1350,7 @@ void do_user_addr_fault(struct pt_regs *regs, struct task_struct *tsk; struct mm_struct *mm; vm_fault_t fault; + int ret; unsigned int flags = FAULT_FLAG_DEFAULT; tsk = current;@@ -1378,6 +1431,22 @@ void(struct pt_regs *regs,quoted
if (error_code & X86_PF_INSTR) flags |= FAULT_FLAG_INSTRUCTION; + /* + * If its an RMP violation, try resolving it. + */ + if (error_code & X86_PF_RMP) { + ret = handle_user_rmp_page_fault(error_code, address); + if (ret == RMP_FAULT_PAGE_SPLIT) { + flags |= FAULT_FLAG_PAGE_SPLIT; + } else if (ret == RMP_FAULT_KILL) { + fault |= VM_FAULT_SIGBUS; + do_sigbus(regs, error_code, address, fault); + return; + } else { + return; + } + }Why not just have handle_user_rmp_page_fault() return a VM_FAULT_* code directly?
I don't have any strong reason against it. In next rev, I can update to use the VM_FAULT_* code and call the do_sigbus() etc.
I also suspect you can just set VM_FAULT_SIGBUS and let the do_sigbus() call later on in the function do its work.quoted
+static int handle_split_page_fault(struct vm_fault *vmf) +{ + if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) + return VM_FAULT_SIGBUS; + + __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL); + return 0; +}What will this do when you hand it a hugetlbfs page?
VMM is updated to not use the hugetlbfs when creating SEV-SNP guests. So, we should not run into it. -Brijesh