Thread (45 messages) 45 messages, 6 authors, 2023-03-02

RE: Trying to reduce NFSv4 timeouts to a few seconds on an established connection

From: Andrew Klaassen <hidden>
Date: 2023-03-02 15:25:53

From: Jeff Layton <jlayton@kernel.org>
Sent: Tuesday, February 28, 2023 8:24 AM

On Mon, 2023-02-27 at 14:48 +0000, Andrew Klaassen wrote:
quoted
quoted
From: Andrew Klaassen <redacted>
Sent: Monday, February 6, 2023 12:19 PM
quoted
From: Andrew Klaassen <redacted>
Sent: Monday, February 6, 2023 10:28 AM
quoted
[snipping for readability; hope that's okay]

 - I'm allocating memory.  I assume that means I should free it
somewhere.
But where?  In xprt_destroy(), which appears to do cleanup?  Or in
xprt_destroy_cb(), which is called from xprt_destroy() and which
frees
xprt-
quoted
servername?  Or somewhere else completely?
 - If I free the allocated memory, will that cause any problems in
the cases where no timeout is passed in via the args and the
static const struct xs_tcp_default_timeout is assigned to
xprt->timeout?
 - If freeing the static const struct default will cause a
problem, what should I do instead?  Allocate and memcpy even when
assigning the default?  And would that mean doing the same thing
for all the other transports that are setting timeouts (local,
udp, tcp, and bc_tcp)?
[snipping more]
Here's the patch in what I hope is its final form.  I'm planning to
test it on a couple of hundred nodes over the next month or two.

Since I'm completely new to this, what would be the chances of
actually getting this patch in the kernel?

Thanks.

Andrew
Excellent work! I'll be interested to hear how the testing goes!


This patch still needs a bit of work. I'd consider this a proof-of- concept. You
are at least demonstrating the problem with this patch (and a possible
solution).

Conceptually, it's not 100% clear to me that we want the exact same timeout
on the RPC call and the xprt. We might, but working with interlocking
timeouts can bring in emergent behavioral changes and I haven't thought
through these.
At this point I'll admit that I don't fully understand the difference between those two, so I expect that your thoughts on it will be more relevant than mine.  :-)  Happy to get more of your feedback on it.  (I do notice that the patch causes timeouts during an initial mount that are twice as long as expected, so I assume that this is related to the two separate things you're talking about.)

Not knowing much, my initial guess is that the solution might be from options like:

 - Create a system-wide tuneable for xs_[local|udp|tcp]_default_timeout.  In our case that's less-than-ideal, since we want to change the total timeout for an NFS mount on a per-server or per-mount basis rather than a system-wide basis.

 - Add a second set of timeout options to NFS so that RPC call and xprt timeouts can be specified separately.  I'm guessing no-one is enthusiastic about option bloat, even if this would be the theoretically cleanest option.

 - Use timeo and retrans for the RPC call timeout, and retry for the xprt timeout.  Or do the opposite.  The NFS manpage describes the current behaviour incorrectly, so this at least wouldn't make the documentation any worse.

quoted
From caa3308a3bcf39eb95d9b59e63bd96361e98305e Mon Sep 17 00:00:00
2001
quoted
From: Andrew Klaassen <redacted>
Date: Fri, 10 Feb 2023 10:37:57 -0500
Subject: [PATCH] Sun RPC: Use passed-in timeouts if available instead
of  always using defaults.
This needs a real patch description. Describe the problem you were
having, and how this patch changes things to address it. Make sure you
add a Signed-off-by line too.

When you resend, send it to the the nfs client maintainers (Trond and
Anna) using git-format-patch and git-send-email, and cc linux-nfs list.
I think your MUA might have mangled the patch a bit. Please look over
Documentation/process/submitting-patches.rst in the kernel source tree
too.
Thanks for the tips.  I did read submitting-patches.rst, but obviously not carefully enough.  :-)  Would it be appropriate to submit the patch as-is (with check-patch.pl fixes), or should any potential interlocking-timeout issues be addressed first?

quoted
---
 include/linux/sunrpc/xprt.h |  3 +++
 net/sunrpc/clnt.c           |  1 +
 net/sunrpc/xprt.c           | 21 +++++++++++++++++++++
 net/sunrpc/xprtsock.c       | 22 +++++++++++++++++++---
 4 files changed, 44 insertions(+), 3 deletions(-)
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index b9f59aabee53..ca7be090cf83 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -333,6 +333,7 @@ struct xprt_create {
      struct svc_xprt         *bc_xprt;       /* NFSv4.1
backchannel */
      struct rpc_xprt_switch  *bc_xps;
      unsigned int            flags;
+     const struct rpc_timeout *timeout;      /* timeout parms */
 };

 struct xprt_class {
@@ -373,6 +374,8 @@
void                  xprt_release_xprt_cong(struct rpc_xprt *xprt, struct rpc_task
*task);
quoted
 void                 xprt_release(struct rpc_task *task);
 struct rpc_xprt *    xprt_get(struct rpc_xprt *xprt);
 void                 xprt_put(struct rpc_xprt *xprt);
+struct rpc_timeout   *xprt_alloc_timeout(const struct rpc_timeout
*timeo,
+                             const struct rpc_timeout
*default_timeo);
 struct rpc_xprt *    xprt_alloc(struct net *net, size_t size,
                              unsigned int num_prealloc,
                              unsigned int max_req);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0b0b9f1eed46..1350c1f489f7 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -532,6 +532,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args
*args)
              .addrlen = args->addrsize,
              .servername = args->servername,
              .bc_xprt = args->bc_xprt,
+             .timeout = args->timeout,
      };
      char servername[48];
      struct rpc_clnt *clnt;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ab453ede54f0..0bb800c90976 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1801,6 +1801,26 @@ static void xprt_free_id(struct rpc_xprt *xprt)
      ida_free(&rpc_xprt_ids, xprt->id);
 }

+struct rpc_timeout *xprt_alloc_timeout(const struct rpc_timeout
*timeo,
+                                    const struct rpc_timeout
*default_timeo)
+{
+     struct rpc_timeout *timeout;
+
+     timeout = kzalloc(sizeof(*timeout), GFP_KERNEL);
+     if (!timeout)
+             return ERR_PTR(-ENOMEM);
+     if (timeo)
+             memcpy(timeout, timeo, sizeof(struct rpc_timeout));
+     else
+             memcpy(timeout, default_timeo, sizeof(struct
rpc_timeout));
I don't think you need an allocation here. struct rpc_timeout is quite
small and it only contains a bunch of integers. I think it'd be better
to just embed this in struct rpc_xprt instead.
quoted
+     return timeout;
+}
+
+static void xprt_free_timeout(struct rpc_xprt *xprt)
+{
+     kfree(xprt->timeout);
+}
+
 struct rpc_xprt *xprt_alloc(struct net *net, size_t size,
              unsigned int num_prealloc,
              unsigned int max_alloc)
@@ -1837,6 +1857,7 @@ EXPORT_SYMBOL_GPL(xprt_alloc);

 void xprt_free(struct rpc_xprt *xprt)
 {
+     xprt_free_timeout(xprt);
      put_net_track(xprt->xprt_net, &xprt->ns_tracker);
      xprt_free_all_slots(xprt);
      xprt_free_id(xprt);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index aaa5b2741b79..13703f8e0ef1 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2924,7 +2924,12 @@ static struct rpc_xprt *xs_setup_udp(struct
xprt_create *args)

      xprt->ops = &xs_udp_ops;

-     xprt->timeout = &xs_udp_default_timeout;
+     xprt->timeout = xprt_alloc_timeout(args->timeout,
&xs_udp_default_timeout);
+     if (IS_ERR(xprt->timeout))
+     {
Kernel coding style has the brackets on the same line as the "if"
statement. You should run your next iteration through checkpatch.pl.
Thanks.  I did that once, but forgot to do it again before sending this version.

quoted
+             ret = ERR_CAST(xprt->timeout);
+             goto out_err;
+     }

      INIT_WORK(&transport->recv_worker,
xs_udp_data_receive_workfn);
      INIT_WORK(&transport->error_worker, xs_error_handle);
@@ -3003,7 +3008,13 @@ static struct rpc_xprt *xs_setup_tcp(struct
xprt_create *args)
      xprt->idle_timeout = XS_IDLE_DISC_TO;

      xprt->ops = &xs_tcp_ops;
-     xprt->timeout = &xs_tcp_default_timeout;
+
+     xprt->timeout = xprt_alloc_timeout(args->timeout,
&xs_tcp_default_timeout);
+     if (IS_ERR(xprt->timeout))
+     {
+             ret = ERR_CAST(xprt->timeout);
+             goto out_err;
+     }

      xprt->max_reconnect_timeout = xprt->timeout->to_maxval;
      xprt->connect_timeout = xprt->timeout->to_initval *
@@ -3071,7 +3082,12 @@ static struct rpc_xprt *xs_setup_bc_tcp(struct
xprt_create *args)
      xprt->prot = IPPROTO_TCP;
      xprt->xprt_class = &xs_bc_tcp_transport;
      xprt->max_payload = RPC_MAX_FRAGMENT_SIZE;
-     xprt->timeout = &xs_tcp_default_timeout;
+     xprt->timeout = xprt_alloc_timeout(args->timeout,
&xs_tcp_default_timeout);
+     if (IS_ERR(xprt->timeout))
+     {
+             ret = ERR_CAST(xprt->timeout);
+             goto out_err;
+     }

      /* backchannel */
      xprt_set_bound(xprt);
--
2.39.1
--
Jeff Layton [off-list ref]
Andrew Klaassen

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help