Re: [PATCH 1/2] x86/random: Retry on RDSEED failure
From: Daniel P. Berrangé <hidden>
Date: 2024-01-30 14:43:29
Also in:
lkml
On Tue, Jan 30, 2024 at 03:06:14PM +0100, Jason A. Donenfeld wrote:
Is that an accurate summary? If it is, then the actual problem is that the hardware provided to solve this problem doesn't actually solve it that well, so we're caught deciding between guest-guest DoS (some other guest on the system uses all RDRAND resources) and cryptographic failure because of a malicious host creating a deterministic environment.
In a CoCo VM environment, a guest DoS is not a unique threat scenario, as it is unrelated to confidentiality. Ensuring fair subdivision of resources between competeing guests is just a general VM threat. There are many easy ways a host admin can stop a guest making computational progress. Simply not scheduling the guest vCPU threads is one. CoCo doesn't try to solve this problem. Preserving confidentiality is the primary aim of CoCo. IOW, if the guest boot is stalled because the kernel is spinning waiting on RDRAND to return data, that's fine. If the kernel panics after "n" RDRAND failures in a row that's fine too. They are both just yet another DoS scenario. If the kernel ignores the RDRAND failure and lets it boot with degraded RNG state there were susceptible to attacks, that would not be OK for CoCo.
But I have two questions: 1) Is this CoCo VM stuff even real? Is protecting guests from hosts actually possible in the end? Is anybody doing this? I assume they are, so maybe ignore this question, but I would like to register my gut feeling that on the Intel platform this seems like an endless whack-a-mole problem like SGX.
It is real, but it is also not perfect. I expect it /will/ be an endless whack-a-mole problem though. None the less, it is a significant layer of defence, as compared to traditional VMs where the guest RAM is nothing more than a 'cat' command away from host admin exposure.
2) Can a malicious host *actually* create a fully deterministic environment? One that'll produce the same timing for the jitter entropy creation, and all the other timers and interrupts and things? I imagine the attestation part of CoCo means these VMs need to run on real Intel silicon and so it can't be single stepped in TCG or something, right? So is this problem actually a real one? And to what degree? Any good experimental research on this? Either way, if you're convinced RDRAND is the *only* way here, adding a `WARN_ON(is_in_early_boot)` to the RDRAND (but not RDSEED) failure path seems a fairly lightweight bandaid. I just wonder if the hardware people could come up with something more reliable that we wouldn't have to agonize over in the kernel.
If RDRAND failure is more of a theoretical problem than a practical real world problem, I'd be inclined to just let the kernel loop on RDRAND failure until it suceeds, with a WARN after 'n' iterations to aid diagnosis of the stall in the unlikely even it did hit. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|