Thread (75 messages) 75 messages, 7 authors, 2021-07-24

Re: [RFC PATCH 00/11] nvme: In-band authentication support

From: Hannes Reinecke <hare@suse.de>
Date: 2021-07-21 06:07:17
Also in: linux-nvme

On 7/20/21 10:26 PM, Vladislav Bolkhovitin wrote:
Hi,

Great to see those patches coming! After some review, they look to be
very well done. Some comments/suggestions below.

1. I strongly recommend to implement DH exponentials reuse (g x mod p /
g y mod p as well as g xy mod p) as specified in section 8.13.5.7
"DH-HMAC-CHAP Security Requirements". When I was working on TP 8006 I
had a prototype that demonstrated that DH math has quite significant
latency, something like (as far as I remember) 30ms for 4K group and few
hundreds of ms for 8K group. For single connection it is not a big deal,
but imagine AMD EPYC with 128 cores. Since all connections are created
sequentially, even with 30 ms per connection time to complete full
remote device connection would become 128*30 => almost 4 seconds. With
8K group it might be more than 10 seconds. Users are unlikely going to
be happy with this, especially in cases, when connecting multiple of
NVMe-oF devices is a part of a server or VM boot sequence.
Oh, indeed, I can confirm that. FFDHE calculations are quite time-consuming.
But incidentally, ECDH and curve25519 are reasonably fast, so maybe
there _is_ a value in having a TPAR asking for them to be specified, too ...
If DH exponential reuse implemented, for all subsequent connections the
DH math is excluded, so authentication overhead becomes pretty much
negligible.

In my prototype I implemented DH exponential reuse as a simple
per-host/target cache that keeps DH exponentials (including g xy mod p)
for up to 10 seconds. Simple and sufficient.
Frankly, I hadn't looked at exponential reuse; this implementation
really is just a first step to get feedback from people if this is a
direction they want to go.
Another, might be ever more significant reason why DH exponential reuse
is important is that without it x (or y on the host side) must always be
randomly generated each time a new connection is established. Which
means, for instance, for 8K groups for each connection 1KB of random
bytes must be taken from the random pool. With 128 connections it is now
128KB. Quite a big pressure on the random pool that DH exponential reuse
mostly avoids.

Those are the 2 reasons why we added this DH exponential reuse sentence
in the spec. In the original TP 8006 there was a small informative piece
explaining reasonings behind that, but for some reasons it was removed
from the final version.
Thanks for the hint. I'll be adding exponential reuse to the code.
2. What is the status of this code from perspective of stability in face
of malicious host behavior? Seems implementation is carefully done, but,
for instance, at the first look I was not able to find a code to clean
up if host in not acting for too long in the middle of exchange. Other
observation is that in nvmet_execute_auth_send()
nvmet_check_transfer_len() does not check if tl size is reasonable,
i.e., for instance, not 1GB.
That is true; exchange timeouts are missing. Will be adding them, of
course. And haven't thought of checking for tl size overflows; will be
adding them, too.
For sure, we don't want to allow remote hosts to hang or crash target.
For instance, because of OOM conditions that happened, because malicious
host asked target to allocate too much memory or open to many being
authenticated connections in which the host is not going to reply in the
middle of exchange.
This is something I'll need to look at, anyway. What we do not want is a
userspace application chipping in and send a 'negotiate' command without
any subsequent steps, thereby invalidating the existing authentication.
Asking, because don't want to go in my review too far ahead from the
author ;)

In this regard, it would be great if you add in your test application
ability to perform authentication with random parameters and randomly
stop responding. Overnight running of such test would give us good
degree of confidence that it will always work as expected.
That indeed would be good; let me think on how something like that can
be implemented.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help