Thread (178 messages) 178 messages, 11 authors, 2022-06-06

Re: [PATCH Part2 RFC v4 09/40] x86/fault: Add support to dump RMP entry on fault

From: Dave Hansen <hidden>
Date: 2021-07-08 16:58:57
Also in: kvm, linux-crypto, linux-efi, linux-mm, lkml, platform-driver-x86

On 7/8/21 9:48 AM, Brijesh Singh wrote:
On 7/8/21 10:30 AM, Dave Hansen wrote:
quoted
quoted
The reason for iterating through 2MB region is; if the faulting address
is not assigned in the RMP table, and page table walk level is 2MB then
one of entry within the large page is the root cause of the fault. Since
we don't know which entry hence I dump all the non-zero entries.
Logically you can figure this out though, right?  Why throw 511 entries
at the console when we *know* they're useless?
Logically its going to be tricky to figure out which exact entry caused
the fault, hence I dump any non-zero entry. I understand it may dump
some useless.
What's tricky about it?

Sure, there's a possibility that more than one entry could contribute to
a fault.  But, you always know *IF* an entry could contribute to a fault.

I'm fine if you run through the logic, don't find a known reason
(specific RMP entry) for the fault, and dump the whole table in that
case.  But, unconditionally polluting the kernel log with noise isn't
very nice for debugging.
quoted
quoted
There are two cases which we need to consider:

1) the faulting page is a guest private (aka assigned)
2) the faulting page is a hypervisor (aka shared)

We will be primarily seeing #1. In this case, we know its a assigned
page, and we can decode the fields.

The #2 will happen in rare conditions,
What rare conditions?
One such condition is RMP "in-use" bit is set; see the patch 20/40.
After applying the patch we should not see "in-use" bit set. If we run
into similar issues, a full RMP dump will greatly help debug.
OK... so dump the "in-use" bit here if you see it.
quoted
quoted
if it happens, one of the undocumented bit in the RMP entry can
provide us some useful information hence we dump the raw values.
You're saying that there are things that can cause RMP faults that
aren't documented?  That's rather nasty for your users, don't you think?
The "in-use" bit in the RMP entry caught me off guard. The AMD APM does
says that hardware sets in-use bit but it *never* explained in the
detail on how to check if the fault was due to in-use bit in the RMP
table. As I said, the documentation folks will be updating the RMP entry
to document the in-use bit. I hope we will not see any other
undocumented surprises, I am keeping my finger cross :)
Oh, ok.  That sounds fine.  Documentation is out of date all the time.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help