Thread (23 messages) 23 messages, 7 authors, 2019-09-21

Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()

From: Andy Lutomirski <luto@amacapital.net>
Date: 2019-09-20 19:52:35
Also in: linux-ext4, linux-man, lkml

Possibly related (same subject, not in this thread)

On Sep 20, 2019, at 12:37 PM, Willy Tarreau [off-list ref] wrote:

On Fri, Sep 20, 2019 at 12:22:17PM -0700, Andy Lutomirski wrote:
quoted
Perhaps userland could register a helper that takes over and does
something better?
If userland sees the failure it can do whatever the developer/distro
packager thought suitable for the system facing this condition.
quoted
But I think the kernel really should do something
vaguely reasonable all by itself.
Definitely, that's what Linus' proposal was doing. Sleeping for some time
is what I call "vaguely reasonable".
I don’t buy it. We have existing programs that can deadlock on boot. Just throwing -EAGAIN at them in a syscall that didn’t previously block does not strike me as reasonable.
quoted
If nothing else, we want the ext4
patch that provoked this whole discussion to be applied,
Oh absolutely!
quoted
which means
that we need to unbreak userspace somehow, and returning garbage it to
is not a good choice.
It depends how it's used. I'd claim that we certainly use randoms for
other things (such as ASLR/hashtables) *before* using them to generate
long lived keys thus we can have a bit more time to get some more
entropy before reaching the point of producing these keys.
The problem is that we don’t know what userspace is doing with the output from getrandom(..., 0), so I think we have to be conservative. New kernels need to work with old user code. It’s okay if they’re slower to boot than they could be.
quoted
Here are some possible approaches that come to mind:

int count;
while (crng isn't inited) {
 msleep(1);
}

and modify add_timer_randomness() to at least credit a tiny bit to
crng_init_cnt.
Without a timeout it's sure we'll still face some situations where
it blocks forever, which is the current problem.
The point is that we keep the timer running by looping like this, which should cause add_timer_randomness() to get called continuously, which should prevent the deadlock.  I assume the deadlock is because we go into nohz-idle and we sit there with nothing happening at all.
quoted
Or we do something like intentionally triggering readahead on some
offset on the root block device.
You don't necessarily have such a device, especially when you're
in an initramfs. It's precisely where userland can be smarter. When
the caller is sfdisk for example, it does have more chances to try
to perform I/O than when it's a tiny http server starting to present
a configuration page.
What I mean is: allow user code to register a usermode helper that helps get entropy. Or just convince distros to bundle some useful daemon that starts at early boot and lives in the initramfs.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help