Re: [PATCH v14 0/9] tls: Add TLS 1.3 hardware offload support
From: Nils Juenemann <hidden>
Date: 2026-06-23 17:53:40
Hi Rishikesh, all, we have been testing the v14 TLS 1.3 HW offload series on a ConnectX-6 DX and hit a sendfile() final-record loss on the device TX path. We reduced it to a self-contained C reproducer and characterized it; reporting it here with the analysis and a question on where a fix belongs. Setup: NIC: ConnectX-6 DX (crypto enabled), FW 22.47.1026, SR-IOV VF, TX offload only Kernel: net-next + this v14 series TLS 1.3, AES-128-GCM, kTLS installed via setsockopt(TLS_TX) on the sending side with fixed test crypto material and no handshake, like tools/testing/selftests/net/tls a server sends a file with the raw sendfile(2) syscall; a client on another host reads the decrypted stream and counts the bytes Trigger: sendfile(2) with a count larger than the bytes remaining in the file (count > EOF). This is what a generic copy loop / Go's net.TCPConn.ReadFrom passes for a file of unknown length (~2 GiB). The kernel sends up to EOF, but the connection's final TLS record then appears not to be put on the wire unless a subsequent write flushes it. An abrupt close() appears to drop it, and the peer receives the whole body except the last record's bytes. Reproducer results (two hosts over the ConnectX - a loopback/same-host connection stays on TLS_SW and does not show it). Same file, 226965 bytes (= 13*16384 + 13973): TLS_HW count>EOF close() -> 212992 short TLS_HW count>EOF close(), no zerocopy -> 212992 same TLS_HW count==exact close() -> 226965 full TLS_HW count>EOF close_notify, then close() -> 226965 full TLS_SW count>EOF close(), hw-tx-offload off -> 226965 full So it is specific to the device-offload TX path: the final record of a count > EOF sendfile() appears not to be finalized/flushed at EOF, only by a following write. A bounded count, a trailing write (close_notify), or software kTLS all avoid it. TLS_TX_ZEROCOPY_RO makes no difference. We are currently using the exact-count workaround in a preview environment. We may be misreading the code, so this is only a pointer: with count > EOF tls_push_data() fills the last record without reaching the size==0 case; on the device path tls_device_record_close() for that pending record appears to run only on the next push, and an abrupt teardown appears to discard it. The software path seems to flush pending TX records on close (tls_sw_release_resources_tx), which would explain why it is unaffected. Reproducer: https://gist.github.com/totallyunknown/a8f0ad3c54e40befde2f5a8d360fa6be It installs kTLS with fixed test crypto material via setsockopt(TLS_TX/TLS_RX), sends a file using the raw sendfile(2) syscall, and compares count > EOF against exact-count and close_notify. The v14 selftest (patch 9/9) sends via send() only and ends cleanly, so it misses this; a sendfile() + count > EOF case reproduces it deterministically for us. Question: should the device offload finalize and flush the connection's final record at EOF / on close, the way software kTLS does, or is a trailing write required by contract? And should a fix live in net/tls (device record close on the final partial record / the close path) or on the mlx5 side? Thanks, Nils Juenemann