RE: [PATCH 1/1] SUNRPC: Use rpc_create_args->timeout to initialize rpc_xprt->timeout
From: Andrew Klaassen <hidden>
Date: 2023-03-16 14:47:54
From: Trond Myklebust <redacted> Sent: Tuesday, March 14, 2023 2:40 PMquoted
On Mar 13, 2023, at 11:17, Andrew Klaassen[off-list ref] wrote:quoted
We are using applications which hang if any NFS servers fail to respond. We would like to be able to control NFS timeouts so that we can control the maximum time that the applications hang. We currently can't do that with TCP NFS mounts, since RPC calls made to an existing NFS mount are first subject to the default untuneable Sun RPC timeout of 2 minutes. (I'll note that the existing NFS manpage seems to not describe current behaviour correctly, since it says that this two-minute timeout applies to initial mount operations (which it does not), and does not say that the two-minute timeout applies to operations on existing mounts (which it does).) An existing thread discussing this patch can be found here: Link: https://lore.kernel.org/linux-nfs/45e2e7f05a13abab777b3b0868744cdbfc62 3f2d.camel@kernel.org/T/ This patch uses the RPC call timeout to set the xprt timeout. In that discussion thread, Jeff Layton has pointed out that this may or may not be the ideal approach. I have suggested these alternatives, and would be happy to get feedback: - Create system-wide tuneables for xs_[local|udp|tcp]_default_timeout. In our case that's less-than-ideal, since we want to change the total timeout for an NFS mount on a per-server or per-mount basis rather than a system-wide basis, but it would do in a pinch. - Add a second set of timeout options to NFS so that RPC call and xprt timeouts can be specified separately. I'm guessing no-one is enthusiastic about option bloat, even if this would be the theoretically cleanest option. I'm guessing this would also involve changing the Sun RPC API and everything that calls it in order for it to accept the second set of timeout options. - Use timeo and retrans for the RPC call timeout, and retry for the xprt timeout. Or do the opposite. The NFS manpage describes the current behaviour incorrectly, so this at least wouldn't make the documentation any worse. I assume this would also involve changing theSun RPC API.quoted
Use rpc_create_args->timeout to initialize rpc_xprt->timeoutJust because something can be done in the kernel, it doesn’t mean that it should be done in the kernel. If you’re unhappy with sunrpc timeouts, then it should be quite possible to do those calls in userspace, and pass the port number down as part of the mount syscall.
Thanks for the direction, Trond. I'll spend some time getting familiar with the code and see if I can make that happen. I'm currently clueless about how to get started, as there doesn't appear to be any way to override sunrpc timeout defaults for any sunrpc call, so I may have some followup questions once I get my head wrapped around the mount code. Andrew