Re: Zombie / Orphan open files
From: Jeff Layton <jlayton@kernel.org>
Date: 2023-01-31 18:13:39
On Tue, 2023-01-31 at 16:34 +0000, Chuck Lever III wrote:
quoted
On Jan 31, 2023, at 9:42 AM, Andrew J. Romero [off-list ref] wrote: In a large campus environment, usage of the relevant memory pool will eventually get so high that a server-side reboot will be needed.The above is sticking with me a bit. Rebooting the server should force clients to re-establish state. Are they not re-establishing open file state for users whose ticket has expired? I would think each client would re-establish state for those open files anyway, and the server would be in the same overcommitted state it was in before it rebooted. We might not have an accurate root cause analysis yet, or I could be missing something.
My assumption was that the client wasn't able to get credentials to run the CLOSE RPC in this case, so it can't properly send the call. That's a big assumption though. It'd be good to confirm this. It looks like the CLOSE codepath on the client calls nfs4_state_protect with NFS_SP4_MACH_CRED_CLEANUP, and that should make it use the machine cred? I'm not 100% clear here though...it looks like that may be conditional on what was sent by the server in EXCHANGE_ID. FWIW, I don't see any reason we shouldn't use the machine cred for the close compound. Nothing we do in there should require permission checking. BTW: is this NFSv4.0 or v4.1+ (or a mix)? -- Jeff Layton [off-list ref]