Thread (99 messages) 99 messages, 15 authors, 2024-02-15

Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

From: Borislav Petkov <bp@alien8.de>
Date: 2024-02-09 17:31:27
Also in: lkml

On Thu, Feb 08, 2024 at 05:44:44AM -0600, Dr. Greg wrote:
I guess a useful starting point would be if AMD would like to offer
any type of quantification for 'astronomically small' when it comes to
the probability of failure over 10 RDRAND attempts... :-)
Right, let's establish the common ground first: please have a look at
this, albeit a bit outdated whitepaper:

https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/white-papers/amd-random-number-generator.pdf

in case you haven't seen it yet.

Now, considering that this is a finite resource, you can imagine that
there can be scenarios where that source can be depleted.

And newer Zen generations perform significantly better. So much so that
on Zen3 and later 10 retries should never observe a failure unless it
is bad hardware. Also, I agree with hpa's note that any and all retries
should be time based.
Secondly, given our test findings and those of RedHat, would it be
safe to assume that EPYC has engineering that prevents RDSEED failures
that Ryzen does not?
Well, roughly speaking, client is a less beefier and less performant
version of server. You can extrapolate that to the topic at hand.

But at least on AMD, any potential DoSing of RDRAND on client doesn't
matter for CoCo because client doesn't enable SEV*.
Both AMD and Intel designs start with a hardware based entropy source.
Intel samples thermal/quantum junction noise, AMD samples execution
jitter over a bank of inverter based oscillators.
See above paper for the AMD side.
An assumption of constant clocked sampling implies a maximum
randomness bandwidth limit.
You said it.
None of this implies that randomness is a finite resource
Huh? This contradicts with what you just said in the above sentence.

Or maybe I'm reading this wrong...
So this leaves the fundamental question of what does an RDRAND or
RDSEED failure return actually imply?
Simple: if no random data is ready at the time the insn executes, it
says "invalid". Because the generator is a finite resource as you said
above, if the software tries to pull random data faster than it can
generate, this is the main case for CF=0.
Silicon is a expensive resource, which would imply a queue depth
limitation for access to the socket common RNG infastructure.  If the
queue is full when an instruction issues, it would be a logical
response to signal an instruction failure quickly and let software try
again.
That's actually in the APM documenting RDRAND:

"If the returned value is invalid, software must execute the instruction
again."
Given the time and engineering invested in the engineering behind both
TDX and SEV-SNP, it would seem unlikely that really smart engineers at
both Intel and AMD didn't anticipate this issue and its proper
resolution for CoCo environments.
You can probably imagine that no one can do a fully secure system in one
single attempt but rather needs to do an iterative process.

And I don't know how much you've followed those technologies but they
*are* the perfect example for such an iterative improvement process.

I hope this answers at least some of your questions.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help