Thread (12 messages) 12 messages, 3 authors, 2022-05-06

Re: [PATCH net-next] tls: Add opt-in zerocopy mode of sendfile()

From: Maxim Mikityanskiy <hidden>
Date: 2022-05-03 18:57:09

On 2022-04-29 22:11, Jakub Kicinski wrote:
On Fri, 29 Apr 2022 17:21:59 +0300 Maxim Mikityanskiy wrote:
quoted
On 2022-04-29 01:11, Jakub Kicinski wrote:
quoted
quoted
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index b12f81a2b44c..715401b20c8b 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -411,10 +411,16 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i)
   	return 0;
   }
   
+union tls_iter_offset {
+	struct iov_iter *msg_iter;
+	int offset;
+};
Is this sort of atrocity used elsewhere in the kernel?
If you can't refactor the code you can pack args into
a structure
What's the point of packing arguments into a struct in this particular
case? Depending on zc_page, I need either msg_iter or offset, and I'm
reusing the same CPU register to pass either of them. The performance
isn't affected, and the amount of memory used is the same. A struct
won't allow to achieve this, it would force me to drag 8 extra bytes,
but we already use all 6 registers used to pass parameters on x86_64.
I know why you're doing this, but you're not writing assembly:

+	rc = tls_push_data(sk, (union tls_iter_offset)&msg_iter, size,

+	return tls_push_data(sk, (union tls_iter_offset)&msg_iter, 0, flags,

even if it's legal C (i.e. not UB) it looks awkward.
It's not UB, cast between a union and its element type is well-defined in C.
quoted
quoted
but I've not seen people cast mutually exclusive
arguments to a union type.
It's the purpose of a union, to hold one of mutually exclusive values,
isn't it?
The union itself is not the problem.
quoted
quoted
Is this "inspired" by some higher
level language?
It's unfortunately inspired by C and its freedom to allow
microoptimizations/hacks. The hack here is that I use a pointer being
NULL or not-NULL as an indicator what type the other argument has.

The closest alternative from high-level languages I can think of is
enums with attached data from rust or swift. However, rust isn't smart
enough to perform the optimization I described, so no, it's not inspired
by it :)

Options that I see here:

1. Union.

2. Just pass both parameters and use one of them. Drawbacks: now we have
7 parameters, one will be passed through the stack, and it's datapath code.

3. Pass `struct iov_iter *` and cast it to `int offset` when zc_page
isn't NULL. As we are compiling with -fno-strict-aliasing, and int
shouldn't be bigger than a pointer on all supported architectures, it's
going to work, and we still have 6 parameters.

4. Combine `int offset` and `int flags` into a single 64-bit parameter.

Which one do you prefer, or do you have anything better in mind?
If you declare the union on the stack in the callers, and pass by value
- is the compiler not going to be clever enough to still DDRT?
Ah, OK, it should do the thing. I thought you wanted me to ditch the 
union altogether.
quoted
quoted
quoted
   static int tls_push_data(struct sock *sk,
-			 struct iov_iter *msg_iter,
+			 union tls_iter_offset iter_offset,
   			 size_t size, int flags,
-			 unsigned char record_type)
+			 unsigned char record_type,
+			 struct page *zc_page)
   {
   	struct tls_context *tls_ctx = tls_get_ctx(sk);
   	struct tls_prot_info *prot = &tls_ctx->prot_info;
@@ -480,15 +486,29 @@ static int tls_push_data(struct sock *sk,
   		}
   
   		record = ctx->open_record;
-		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
-		copy = min_t(size_t, copy, (max_open_record_len - record->len));
-
-		if (copy) {
-			rc = tls_device_copy_data(page_address(pfrag->page) +
-						  pfrag->offset, copy, msg_iter);
-			if (rc)
-				goto handle_error;
-			tls_append_frag(record, pfrag, copy);
+
+		if (!zc_page) {
+			copy = min_t(size_t, size, pfrag->size - pfrag->offset);
+			copy = min_t(size_t, copy, max_open_record_len - record->len);
Nope, refactor this please. 95% sure you don't need 4 indentation
levels here. Space left in record can be calculated before the if,
then you can do

if (zc_page) {
	..
} else if (copy) {
	..
}
It'll save one indentation level for the zc_page case, but not for the
other:

copy = min_t(size_t, size, max_open_record_len - record->len);
if (zc_page && copy) {
      ...
} else {
      // I still have to do this in non-zc case:
      copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
Can pfrag->size - pfrag->offset really be 0 provided
tls_do_allocation() did not return an error?
No, it can't. Now it makes more sense to me. Thanks for the clarification.
quoted
      if (copy) {
          // Same indentation level as in my patch.
          ...
      }
}

Is it good enough?
quoted
quoted
We should allow setting the option for non-HW. It's an opt-in, the app
has opted in, the fact that the kernel will not make use of the liberty
to apply the optimization is not important, no?
Yes, I agree that if the application opted in, it should work properly
regardless of whether the optimization actually did turn on. However,
the indication could be useful, for example, for diagnostic purposes, to
show the user whether zerocopy mode was enabled, if someone is trying to
debug some performance issue. If you insist, though, I can make
setsockopt succeed and getsockopt return 1. What do you think?
I'd say "whether the optimization is applicable" rather than "whether
the optimization is turned on". User can check whether the connection
is using SW or HW TLS if they want to make sure it's taken advantage of.

Speaking of which, should we report the state of this knob via socket
diag?
That sounds like an option, I'll take a look. TLS doesn't expose 
anything via diag yet, does it? The only option to distinguish SW/HW TLS 
is ethtool, and there is no per-socket check, right? Cause a HW TLS 
socket can downgrade to SW after tls_device_down, and ethtool won't show it.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help