Thread (23 messages) 23 messages, 7 authors, 2019-09-21

Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()

From: Willy Tarreau <w@1wt.eu>
Date: 2019-09-20 19:37:55
Also in: linux-ext4, linux-man, lkml

Possibly related (same subject, not in this thread)

On Fri, Sep 20, 2019 at 12:22:17PM -0700, Andy Lutomirski wrote:
Perhaps userland could register a helper that takes over and does
something better?
If userland sees the failure it can do whatever the developer/distro
packager thought suitable for the system facing this condition.
But I think the kernel really should do something
vaguely reasonable all by itself.
Definitely, that's what Linus' proposal was doing. Sleeping for some time
is what I call "vaguely reasonable".
If nothing else, we want the ext4
patch that provoked this whole discussion to be applied,
Oh absolutely!
which means
that we need to unbreak userspace somehow, and returning garbage it to
is not a good choice.
It depends how it's used. I'd claim that we certainly use randoms for
other things (such as ASLR/hashtables) *before* using them to generate
long lived keys thus we can have a bit more time to get some more
entropy before reaching the point of producing these keys.
Here are some possible approaches that come to mind:

int count;
while (crng isn't inited) {
  msleep(1);
}

and modify add_timer_randomness() to at least credit a tiny bit to
crng_init_cnt.
Without a timeout it's sure we'll still face some situations where
it blocks forever, which is the current problem.
Or we do something like intentionally triggering readahead on some
offset on the root block device.
You don't necessarily have such a device, especially when you're
in an initramfs. It's precisely where userland can be smarter. When
the caller is sfdisk for example, it does have more chances to try
to perform I/O than when it's a tiny http server starting to present
a configuration page.
We should definitely not trigger *blocking* IO.
I think I agree.
Also, I wonder if the real problem preventing the RNG from staring up
is that the crng_init_cnt threshold is too high.  We have a rather
baroque accounting system, and it seems like we can accumulate and
credit entropy for a very long time indeed without actually
considering ourselves done.
I have no opinion on this, lacking the skills to evaluate the situation.
What I can say for sure is that I've faced the non-booting issue quite a
number of times on headless systems, and conversely in the 2.4 era, my
front reverse-proxy by then had the same SSH key as 89 other machines on
the net. So there's surely a sweet spot to find between those two extremes.
I tend to think that waiting *a little bit* for the *first* random is
acceptable, even 10-15s, by the time the user starts to think about
pressing the reset button the system might finish to boot. Hashing some
RAM locations and the RTC when present can also help a little bit. If
at least my machine by then had combined the RTC's date and time with
the hash, chances for a key collision would have gone down to one over
many thousands.

Willy
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help