Thread (40 messages) 40 messages, 8 authors, 2024-11-20

Re: [PATCH v1 4/5] KVM: Introduce KVM_EXIT_COCO exit type

From: Michael Roth <hidden>
Date: 2024-11-01 22:04:09
Also in: kvm, lkml

On Fri, Nov 01, 2024 at 01:53:26PM -0700, Dionna Amalie Glaze wrote:
On Mon, Oct 28, 2024 at 11:20 AM Sean Christopherson [off-list ref] wrote:
quoted
On Fri, Sep 13, 2024, Dionna Amalie Glaze wrote:
quoted
We can extend the ccp driver to, on extended guest request, lock the
command buffer, get the REPORTED_TCB, complete the request, unlock the
command buffer, and return both the response and the REPORTED_TCB at
the time of the request.
Holding a lock across an exit to userspace seems wildly unsafe.
I wasn't suggesting this. I was suggesting adding a special ccp symbol
that would perform two sev commands under the same lock to ensure we
know the REPORTED_TCB that was used to derive the VCEK that signs an
attestation report in the MSG_REPORT_REQ guest request. We use that
atomicity to be sure that when we exit to user space to request
certificates that we're getting the right version certificates.
quoted
Can you explain the race that you are trying to close, with the exact "bad" sequence
of events laid out in chronological order, and an explanation of why the race can't
be sovled in userspace?  I read through your previous comment[*] (which I assume
is the race you want to close?), but I couldn't quite piece together exactly what's
broken.
Hi Dionna,
1. the control plane delivers a firmware update. Current TCB version
goes up. The machine signals that it needs new certificates before it
can commit.
2. VM performs an extended guest request.
3. KVM exits to user space to get certificates before getting the
report from firmware.
4. [what I understand Michael Roth was suggesting] User space grabs a
file lock to see if it can read the cached certificates. It reads the
certificates and releases the lock before returning to KVM.
5. the control plane delivers the certificates to the machine and
tells it to commit. The machine grabs the certificate file lock, runs
SNP_COMMIT, and releases the file lock. This command updates both
COMMITTED_TCB and REPORTED_TCB.
6. KVM asks firmware to complete the MSG_REPORT_REQ request, but it's
a different REPORTED_TCB.
7. Guest receives the wrong certificates for certifying the report it
just received.

The fact that 4 has to release the lock before getting the attestation
report is the problem.
We wouldn't actually release the lock before getting the attestation
report. There's more specifics on the suggested flow in the documentation
update accompanying this patch:

+    NOTE: In the case of SEV-SNP, the endorsement key used by firmware may
+    change as a result of management activities like updating SEV-SNP firmware
+    or loading new endorsement keys, so some care should be taken to keep the
+    returned certificate data in sync with the actual endorsement key in use by
+    firmware at the time the attestation request is sent to SNP firmware. The
+    recommended scheme to do this is:
+
+      - The VMM should obtain a shared or exclusive lock on the path the
+        certificate blob file resides at before reading it and returning it to
+        KVM, and continue to hold the lock until the attestation request is
+        actually sent to firmware. To facilitate this, the VMM can set the
+        ``immediate_exit`` flag of kvm_run just after supplying the certificate
+        data, and just before and resuming the vCPU. This will ensure the vCPU
+        will exit again to userspace with ``-EINTR`` after it finishes fetching
+        the attestation request from firmware, at which point the VMM can
+        safely drop the file lock.
+
+      - Tools/libraries that perform updates to SNP firmware TCB values or
+        endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``,
+        ``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see
+        Documentation/virt/coco/sev-guest.rst for more details) in such a way
+        that the certificate blob needs to be updated, should similarly take an
+        exclusive lock on the certificate blob for the duration of any updates
+        to endorsement keys or the certificate blob contents to ensure that
+        VMMs using the above scheme will not return certificate blob data that
+        is out of sync with the endorsement key used by firmware.

So #5 would not be able to obtain an exclusive file lock until userspace
receives confirmation that the attestation request was processed by
firmware. At that point it will be an accurate reflection of the
attestation state associated with that particular version of the
certificates that was fetched from userspace. So at that point the,
transaction is done at that point and userspace can safely release the lock.

-Mike
If we instead get the report and know what the REPORTED_TCB was when
serving that request, then we can exit to user space requesting the
certificates for the report in hand.
A concurrent update can update the reported_tcb like in the above
scenario, but it won't interfere with certificates since the machine
should have certificates for both TCB_VERSIONs to provide until the
commit is complete.

I don't think it's workable to have 1 grab the file lock and for 5 to
release it. Waiting for a service to update stale certificates should
not block user attestation requests. It would make 4's failure to get
the lock return VMM_BUSY and eventually cause attestations to time out
in sev-guest.
quoted
[*] https://lore.kernel.org/all/CAAH4kHb03Una2kcvyC3W=1ZfANBWF_7a7zsSmWhr_r9g3rCDZw@mail.gmail.com (local)


-- 
-Dionna Glaze, PhD, CISSP, CCSP (she/her)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help