Re: [PATCH net-next] tls: Add opt-in zerocopy mode of sendfile()
From: Maxim Mikityanskiy <hidden>
Date: 2022-05-03 18:57:09
On 2022-04-29 22:11, Jakub Kicinski wrote:
On Fri, 29 Apr 2022 17:21:59 +0300 Maxim Mikityanskiy wrote:quoted
On 2022-04-29 01:11, Jakub Kicinski wrote:quoted
quoted
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index b12f81a2b44c..715401b20c8b 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c@@ -411,10 +411,16 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i) return 0; } +union tls_iter_offset { + struct iov_iter *msg_iter; + int offset; +};Is this sort of atrocity used elsewhere in the kernel? If you can't refactor the code you can pack args into a structureWhat's the point of packing arguments into a struct in this particular case? Depending on zc_page, I need either msg_iter or offset, and I'm reusing the same CPU register to pass either of them. The performance isn't affected, and the amount of memory used is the same. A struct won't allow to achieve this, it would force me to drag 8 extra bytes, but we already use all 6 registers used to pass parameters on x86_64.I know why you're doing this, but you're not writing assembly: + rc = tls_push_data(sk, (union tls_iter_offset)&msg_iter, size, + return tls_push_data(sk, (union tls_iter_offset)&msg_iter, 0, flags, even if it's legal C (i.e. not UB) it looks awkward.
It's not UB, cast between a union and its element type is well-defined in C.
quoted
quoted
but I've not seen people cast mutually exclusive arguments to a union type.It's the purpose of a union, to hold one of mutually exclusive values, isn't it?The union itself is not the problem.quoted
quoted
Is this "inspired" by some higher level language?It's unfortunately inspired by C and its freedom to allow microoptimizations/hacks. The hack here is that I use a pointer being NULL or not-NULL as an indicator what type the other argument has. The closest alternative from high-level languages I can think of is enums with attached data from rust or swift. However, rust isn't smart enough to perform the optimization I described, so no, it's not inspired by it :) Options that I see here: 1. Union. 2. Just pass both parameters and use one of them. Drawbacks: now we have 7 parameters, one will be passed through the stack, and it's datapath code. 3. Pass `struct iov_iter *` and cast it to `int offset` when zc_page isn't NULL. As we are compiling with -fno-strict-aliasing, and int shouldn't be bigger than a pointer on all supported architectures, it's going to work, and we still have 6 parameters. 4. Combine `int offset` and `int flags` into a single 64-bit parameter. Which one do you prefer, or do you have anything better in mind?If you declare the union on the stack in the callers, and pass by value - is the compiler not going to be clever enough to still DDRT?
Ah, OK, it should do the thing. I thought you wanted me to ditch the union altogether.
quoted
quoted
quoted
static int tls_push_data(struct sock *sk, - struct iov_iter *msg_iter, + union tls_iter_offset iter_offset, size_t size, int flags, - unsigned char record_type) + unsigned char record_type, + struct page *zc_page) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_prot_info *prot = &tls_ctx->prot_info;@@ -480,15 +486,29 @@ static int tls_push_data(struct sock *sk, } record = ctx->open_record; - copy = min_t(size_t, size, (pfrag->size - pfrag->offset)); - copy = min_t(size_t, copy, (max_open_record_len - record->len)); - - if (copy) { - rc = tls_device_copy_data(page_address(pfrag->page) + - pfrag->offset, copy, msg_iter); - if (rc) - goto handle_error; - tls_append_frag(record, pfrag, copy); + + if (!zc_page) { + copy = min_t(size_t, size, pfrag->size - pfrag->offset); + copy = min_t(size_t, copy, max_open_record_len - record->len);Nope, refactor this please. 95% sure you don't need 4 indentation levels here. Space left in record can be calculated before the if, then you can do if (zc_page) { .. } else if (copy) { .. }It'll save one indentation level for the zc_page case, but not for the other: copy = min_t(size_t, size, max_open_record_len - record->len); if (zc_page && copy) { ... } else { // I still have to do this in non-zc case: copy = min_t(size_t, copy, pfrag->size - pfrag->offset);Can pfrag->size - pfrag->offset really be 0 provided tls_do_allocation() did not return an error?
No, it can't. Now it makes more sense to me. Thanks for the clarification.
quoted
if (copy) { // Same indentation level as in my patch. ... } } Is it good enough?quoted
quoted
We should allow setting the option for non-HW. It's an opt-in, the app has opted in, the fact that the kernel will not make use of the liberty to apply the optimization is not important, no?Yes, I agree that if the application opted in, it should work properly regardless of whether the optimization actually did turn on. However, the indication could be useful, for example, for diagnostic purposes, to show the user whether zerocopy mode was enabled, if someone is trying to debug some performance issue. If you insist, though, I can make setsockopt succeed and getsockopt return 1. What do you think?I'd say "whether the optimization is applicable" rather than "whether the optimization is turned on". User can check whether the connection is using SW or HW TLS if they want to make sure it's taken advantage of. Speaking of which, should we report the state of this knob via socket diag?
That sounds like an option, I'll take a look. TLS doesn't expose anything via diag yet, does it? The only option to distinguish SW/HW TLS is ethtool, and there is no per-socket check, right? Cause a HW TLS socket can downgrade to SW after tls_device_down, and ethtool won't show it.