Re: [RFC PATCH 00/11] nvme: In-band authentication support
From: Hannes Reinecke <hare@suse.de>
Date: 2021-07-21 06:07:17
Also in:
linux-nvme
On 7/20/21 10:26 PM, Vladislav Bolkhovitin wrote:
Hi, Great to see those patches coming! After some review, they look to be very well done. Some comments/suggestions below. 1. I strongly recommend to implement DH exponentials reuse (g x mod p / g y mod p as well as g xy mod p) as specified in section 8.13.5.7 "DH-HMAC-CHAP Security Requirements". When I was working on TP 8006 I had a prototype that demonstrated that DH math has quite significant latency, something like (as far as I remember) 30ms for 4K group and few hundreds of ms for 8K group. For single connection it is not a big deal, but imagine AMD EPYC with 128 cores. Since all connections are created sequentially, even with 30 ms per connection time to complete full remote device connection would become 128*30 => almost 4 seconds. With 8K group it might be more than 10 seconds. Users are unlikely going to be happy with this, especially in cases, when connecting multiple of NVMe-oF devices is a part of a server or VM boot sequence.
Oh, indeed, I can confirm that. FFDHE calculations are quite time-consuming. But incidentally, ECDH and curve25519 are reasonably fast, so maybe there _is_ a value in having a TPAR asking for them to be specified, too ...
If DH exponential reuse implemented, for all subsequent connections the DH math is excluded, so authentication overhead becomes pretty much negligible. In my prototype I implemented DH exponential reuse as a simple per-host/target cache that keeps DH exponentials (including g xy mod p) for up to 10 seconds. Simple and sufficient.
Frankly, I hadn't looked at exponential reuse; this implementation really is just a first step to get feedback from people if this is a direction they want to go.
Another, might be ever more significant reason why DH exponential reuse is important is that without it x (or y on the host side) must always be randomly generated each time a new connection is established. Which means, for instance, for 8K groups for each connection 1KB of random bytes must be taken from the random pool. With 128 connections it is now 128KB. Quite a big pressure on the random pool that DH exponential reuse mostly avoids. Those are the 2 reasons why we added this DH exponential reuse sentence in the spec. In the original TP 8006 there was a small informative piece explaining reasonings behind that, but for some reasons it was removed from the final version.
Thanks for the hint. I'll be adding exponential reuse to the code.
2. What is the status of this code from perspective of stability in face of malicious host behavior? Seems implementation is carefully done, but, for instance, at the first look I was not able to find a code to clean up if host in not acting for too long in the middle of exchange. Other observation is that in nvmet_execute_auth_send() nvmet_check_transfer_len() does not check if tl size is reasonable, i.e., for instance, not 1GB.
That is true; exchange timeouts are missing. Will be adding them, of course. And haven't thought of checking for tl size overflows; will be adding them, too.
For sure, we don't want to allow remote hosts to hang or crash target. For instance, because of OOM conditions that happened, because malicious host asked target to allocate too much memory or open to many being authenticated connections in which the host is not going to reply in the middle of exchange.
This is something I'll need to look at, anyway. What we do not want is a userspace application chipping in and send a 'negotiate' command without any subsequent steps, thereby invalidating the existing authentication.
Asking, because don't want to go in my review too far ahead from the author ;) In this regard, it would be great if you add in your test application ability to perform authentication with random parameters and randomly stop responding. Overnight running of such test would give us good degree of confidence that it will always work as expected.
That indeed would be good; let me think on how something like that can be implemented. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer