Re: [PATCH v2] SUNRPC: Fixup v4.1 backchannel request timeouts
From: Benjamin Coddington <hidden>
Date: 2023-12-11 14:22:27
On 11 Dec 2023, at 8:54, Trond Myklebust wrote:
On Mon, 2023-12-11 at 06:53 -0500, Benjamin Coddington wrote:quoted
On 9 Dec 2023, at 4:55, Trond Myklebust wrote:quoted
On Fri, 2023-12-08 at 14:19 -0500, Benjamin Coddington wrote:quoted
After commit 59464b262ff5 ("SUNRPC: SOFTCONN tasks should time out when on the sending list"), any 4.1 backchannel tasks placed on the sending queue would immediately return with -ETIMEDOUT since their req timers are zero. We can fix this by keeping a copy of the rpc_clnt's timeout params on the transport and using them to properly setup the timeouts on the v4.1 backchannel tasks' req. Fixes: 59464b262ff5 ("SUNRPC: SOFTCONN tasks should time out when on the sending list") Signed-off-by: Benjamin Coddington <redacted> --- include/linux/sunrpc/xprt.h | 1 + net/sunrpc/clnt.c | 3 +++ net/sunrpc/xprt.c | 23 ++++++++++++++--------- 3 files changed, 18 insertions(+), 9 deletions(-)diff --git a/include/linux/sunrpc/xprt.hb/include/linux/sunrpc/xprt.h index f85d3a0daca2..7565902053f3 100644--- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h@@ -285,6 +285,7 @@ struct rpc_xprt {* items */ struct list_head bc_pa_list; /* List of preallocated * backchannel rpc_rqst's */ + struct rpc_timeout bc_timeout; /* backchannel timeout params */ #endif /* CONFIG_SUNRPC_BACKCHANNEL */ struct rb_root recv_queue; /* Receive queue */diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index d6805c1268a7..5891757c88b1 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c@@ -279,6 +279,9 @@ static struct rpc_xprt*rpc_clnt_set_transport(struct rpc_clnt *clnt, clnt->cl_autobind = 1; clnt->cl_timeout = timeout; +#if defined(CONFIG_SUNRPC_BACKCHANNEL) + memcpy(&xprt->bc_timeout, timeout, sizeof(struct rpc_timeout)); +#endifHmm... The xprt can and will be shared among a number of rpc_clnt instances. I therefore think we're better off doing this when we're setting up the back channel... and it seems the timeouts could be different for each, so now I think keeping a copy of the last rpc_clnt's timeouts on the xprt is wrong. We could use the current timeouts from the nfs_client side after we figure out which nfs_client that would be in nfs4_callback_compound(). Trouble is if we have to make a reply before getting that far there's still no timeout set, but that's probably a rare case and could use the xprt type defaults.The suggestion is that we consider the callback channel to be associated with the 'service rpc_client', i.e. the one that is assigned to the nfs_client and that is used for lease recovery, etc.
Right, I'm on the same page - there's just no place to keep the timeout currently. We can't just carry it on the xprt since the xprt can be used for various clients, and there's multiple xprts to handle too.
If you set it up after the negotiation of the session, and after picking up the lease time, then you should have a useful value. There will be no other client activity anyway until the client is marked as being ready, so you can shoehorn it in there (in nfs41_init_clientid() as suggested below) to address the common case. Then there is the issue of bind_conn_to_session (when we're setting up the callback channel anew because the server lost it) and the cases where we're adding new xprts.
I think all these cases can be avoided by setting the callback responses' req timeout values during callback processing after we've looked up the appropriate nfs_client for the response. Then we can just use the cl_timeout, no need to touch the xprts. Otherwise, I'm looking at every xprt having a list of bc_timeouts for every rpc_client or some mess like that. I really appreciate your suggestions, Ben