Re: [RFC]: Support for zero-copy TCP transmit of user space data
From: Vladislav Bolkhovitin <hidden>
Date: 2008-12-19 19:18:10
Also in:
linux-mm, linux-scsi, lkml
Jens Axboe, on 12/19/2008 10:07 PM wrote:
On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:quoted
David M. Lloyd, on 12/18/2008 09:43 PM wrote:quoted
On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:quoted
An iSCSI target driver iSCSI-SCST was a part of the patchset (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to have TCP zero-copy transmit of user space data was implemented. Patch, implementing this optimization was also sent in the patchset, see http://lkml.org/lkml/2008/12/10/296.I'm probably ignorant of about 90% of the context here, but isn't this the sort of problem that was supposed to have been solved by vmsplice(2)?No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But, even if it was a user space driver, vmsplice wouldn't change anything much. It doesn't have a possibility for a user to know, when transmission of the data finished. So, it is intended to be used as: vmsplice() buffer -> munmap() the buffer -> mmap() new buffer -> vmsplice() it. But on the mmap() stage kernel has to zero all the newly mapped pages and zeroing memory isn't much faster, than copying it. Hence, there would be no considerable performance increase.vmsplice() isn't the right choice, but splice() very well could be. You could easily use splice internally as well. The vmsplice() part sort-of applies in the sense that you want to fill pages into a pipe, which is essentially what vmsplice() does. You'd need some helper to do that.
Sorry, Jens, but splice() works only if there is a file handle on the another side, so user space doesn't see data buffers. But SCST needs to serve a wider usage cases, like reading data with decompression from a virtual tape, where decompression is done in user space. For those only complete zero-copy network send, which I implemented, can give the best performance.
And the ack-on-xmit-done bits is something that splice-to-socket needs anyway, so I think it'd be quite a suitable choice for this.
So, are you writing that splice() could also benefit from the zero-copy transmit feature, like I implemented? Thanks, Vlad