Re: [PATCH v2 6/7] nvme-tcp: Support KeyUpdate
From: Alistair Francis <hidden>
Date: 2025-09-17 03:14:42
Also in:
linux-doc, linux-nfs, linux-nvme, lkml
On Tue, Sep 16, 2025 at 11:04 PM Hannes Reinecke [off-list ref] wrote:
On 9/5/25 04:46, alistair23@gmail.com wrote:quoted
From: Alistair Francis <redacted> If the nvme_tcp_try_send() or nvme_tcp_try_recv() functions return EKEYEXPIRED then the underlying TLS keys need to be updated. This occurs on an KeyUpdate event. If the NVMe Target (TLS server) initiates a KeyUpdate this patch will allow the NVMe layer to process the KeyUpdate request and forward the request to userspace. Userspace must then update the key to keep the connection alive. This patch allows us to handle the NVMe target sending a KeyUpdate request without aborting the connection. At this time we don't support initiating a KeyUpdate. Link: https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.3 Signed-off-by: Alistair Francis <redacted> --- v2: - Don't change the state - Use a helper function for KeyUpdates - Continue sending in nvme_tcp_send_all() after a KeyUpdate - Remove command message using recvmsg drivers/nvme/host/tcp.c | 73 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 70 insertions(+), 3 deletions(-)diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 776047a71436..b6449effc2ac 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c@@ -171,6 +171,7 @@ struct nvme_tcp_queue { bool tls_enabled; u32 rcv_crc; u32 snd_crc; + key_serial_t user_session_id; __le32 exp_ddgst; __le32 recv_ddgst; struct completion tls_complete;@@ -210,6 +211,7 @@ static int nvme_tcp_start_tls(struct nvme_ctrl *nctrl, struct nvme_tcp_queue *queue, key_serial_t pskid, handshake_key_update_type keyupdate); +static void update_tls_keys(struct nvme_tcp_queue *queue); static inline struct nvme_tcp_ctrl *to_tcp_ctrl(struct nvme_ctrl *ctrl) {@@ -393,6 +395,14 @@ static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue) do { ret = nvme_tcp_try_send(queue); } while (ret > 0); + + if (ret == -EKEYEXPIRED) { + update_tls_keys(queue); + + do { + ret = nvme_tcp_try_send(queue); + } while (ret > 0); + } } static inline bool nvme_tcp_queue_has_pending(struct nvme_tcp_queue *queue)@@ -1347,6 +1357,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue) done: if (ret == -EAGAIN) { ret = 0; + } else if (ret == -EKEYEXPIRED) { + goto out; } else if (ret < 0) { dev_err(queue->ctrl->ctrl.device, "failed to send request %d\n", ret);@@ -1371,9 +1383,56 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue) queue->nr_cqe = 0; consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb); release_sock(sk); + + /* If we received EINVAL from read_sock then it generally means the + * other side sent a command message. So let's try to clear it from + * our queue with a recvmsg, otherwise we get stuck in an infinite + * loop. + */ + if (consumed == -EINVAL) { + char cbuf[CMSG_LEN(sizeof(char))] = {}; + struct msghdr msg = { .msg_flags = MSG_DONTWAIT }; + struct bio_vec bvec; + + bvec_set_virt(&bvec, (void *)cbuf, sizeof(cbuf)); + iov_iter_bvec(&msg.msg_iter, ITER_DEST, &bvec, 1, sizeof(cbuf)); + + msg.msg_control = cbuf; + msg.msg_controllen = sizeof(cbuf); + + consumed = sock_recvmsg(sock, &msg, msg.msg_flags); + } + return consumed == -EAGAIN ? 0 : consumed; } +static void update_tls_keys(struct nvme_tcp_queue *queue) +{ + int qid = nvme_tcp_queue_id(queue); + int ret; + + dev_dbg(queue->ctrl->ctrl.device, + "updating key for queue %d\n", qid); + + cancel_work(&queue->io_work); + handshake_req_cancel(queue->sock->sk); + handshake_sk_destruct_req(queue->sock->sk); +Careful here. The RFC fully expects to have several KeyUpdate requests pending (eg if both sides decide so initiate a KeyUpdate at the same time). And cancelling a handshake request would cause tlshd/gnutls to lose track of the generation counter and generate an invalid traffic secret. I would just let it rip and don't bother with other handshake requests.
Unfortunately that doesn't work as future calls to `handshake_req_hash_add()` will fail. I now think that's a bug in `handshake_complete()` and I have a better fix in the next version.
quoted
+ nvme_stop_keep_alive(&(queue->ctrl->ctrl)); + flush_work(&(queue->ctrl->ctrl).async_event_work); +Oh bugger. Seems like gnutls is generating the KeyUpdate message itself, and we have to wait for that.
Yes, we have gnutls generate the message.
So much for KeyUpdate being transparent without having to stop I/O... Can't we fix gnutls to make sending the KeyUpdate message and changing the IV parameters an atomic operation? That would be a far better
I'm not sure I follow. ktls-utils will first restore the gnutls session. Then have gnutls trigger a KeyUpdate.gnutls will send a KeyUpdate and then tell the kernel the new keys. The kernel cannot send or encrypt any data after the KeyUpdate has been sent until the keys are updated. I don't see how we could make it an atomic operation. We have to stop the traffic between sending a KeyUpdate and updating the keys. Otherwise we will send invalid data.
interface, as then we would not need to stop I/O and the handshake process could run fully asynchronous to normal I/O...quoted
+ ret = nvme_tcp_start_tls(&(queue->ctrl->ctrl), + queue, queue->ctrl->ctrl.tls_pskid, + HANDSHAKE_KEY_UPDATE_TYPE_RECEIVED); + + if (ret < 0) { + dev_err(queue->ctrl->ctrl.device, + "failed to update the keys %d\n", ret); + nvme_tcp_fail_request(queue->request); + nvme_tcp_done_send_req(queue); + } +} + static void nvme_tcp_io_work(struct work_struct *w) { struct nvme_tcp_queue *queue =@@ -1389,15 +1448,21 @@ static void nvme_tcp_io_work(struct work_struct *w) mutex_unlock(&queue->send_mutex); if (result > 0) pending = true; - else if (unlikely(result < 0)) + else if (unlikely(result < 0)) { + if (result == -EKEYEXPIRED) + update_tls_keys(queue);How exactly can we get -EKEYEXPIRED when _sending_?
Good point. You can't with this current patch set. I have patches on top of this that will generate a KeyUpate as part of the send operation, which I plan to submit after this series. So this is a bit of prep work to setup the NVMe layer to handle sending and receiving KeyUpdate requests. I can drop this change from the series if that's prefered?
To my understanding that would have required userspace to intercept here trying (or even sending) a KeyUpdate message, right?
Not necessarily. The TLS layer can trigger a KeyUpdate independent of userspace. This would happen for example if the sequence count was about to overflow, which is what I use in my testing. Userspace has no idea of the current sequence number, so it can't be involved. The kernel will need to start the KeyUpdate send if the rec_seq is about to overflow.
So really not something we should see during normal operation. As mentioned in my previous mail we should rather code the KeyUpdate process itself here, too. Namely: Trigger the KeyUpdate via userspace (eg by writing into the tls_key attribute for the controller), and then have the kernel side to call out into tlshd to initiate the KeyUpdate 'handshake'.
Yeah, I agree about exposing a way for userspace to trigger an update. That would only be for testing though, as in normal operation userspace has no insight into the current connection state. In a production system the kernel TLS layer will need to initiate a KeyUpdate.
That way we have identical flow of control for both the sending and receiving side. Incidentally: the RFC has some notion about 'request_update' setting in the KeyUpdate message. Is that something we have to care about at this level?
It is something we will need to care about. At this stage it isn't supported as it adds a little bit of complexity, but I should be able to extend the current approach to support a request_update. Alistair
Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich