RE: [PATCH v25 00/20] nvme-tcp receive offloads
From: David Laight <hidden>
Date: 2024-06-15 21:35:48
Also in:
linux-nvme
From: Christoph Hellwig
Sent: 11 June 2024 07:42 On Mon, Jun 10, 2024 at 05:30:34PM +0300, Sagi Grimberg wrote:quoted
quoted
efficient header splitting in the NIC, either hard coded or even better downloadable using something like eBPF.From what I understand, this is what this offload is trying to do. It uses the nvme command_id similar to how the read_stag is used in iwarp, it tracks the NVMe/TCP pdus to split pdus from data transfers, and maps the command_id to an internal MR for dma purposes. What I think you don't like about this is the interface that the offload exposes to the TCP ulp driver (nvme-tcp in our case)?I don't see why a memory registration is needed at all. The by far biggest painpoint when doing storage protocols (including file systems) over IP based storage is the data copy on the receive path because the payload is not aligned to a page boundary.
How much does the copy cost anyway? If the hardware has merged the segments then it should be a single copy. On x86 (does anyone care about anything else :-) 'rep mosvb' with a cache-line aligned destination runs at 64 bytes/clock. (The source alignment doesn't matter at all.) I guess it loads the source data into the D-cache, the target is probably required anyway - or you wouldn't be doing a read. David
So we need to figure out a way that is as stateless as possible that allows aligning the actual data payload on a page boundary in an otherwise normal IP receive path.
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)