Re: Zombie / Orphan open files
From: Olga Kornievskaia <hidden>
Date: 2023-01-31 22:14:20
On Tue, Jan 31, 2023 at 2:55 PM Andrew J. Romero [off-list ref] wrote:
quoted
What you are describing sounds like a bug in a system (be it client or server). There is state that the client thought it closed but the server still keeping that state.Hi Olga Based on my simple test script experiment, Here's a summary of what I believe is happening 1. An interactive user starts a process that opens a file or multiple files 2. A disruption, that prevents NFS-client <-> NFS-server communication, occurs while the file is open. This could be due to having the file open a long time or due to opening the file too close to the time of disruption. ( I believe the most common "disruption" is credential expiration ) 3) The user's process terminates before the disruption is cleared. ( or stated another way , the disruption is not cleared until after the user process terminates ) At the time the user process terminates, the process can not tell the server to close the server-side file state. After the process terminates, nothing will ever tell the server to close the files. The now zombie open files will continue to consume server-side resources. In environments with many users, the problem is significant My reasons for posting: - Are not to have your team help troubleshoot my specific issue ( that would be quite rude ) they are: - Determine If my NAS vendor might be accidentally not doing something they should be. ( I now don't really think this is the case. )
It's hard to say who's at fault here without having some more info like tracepoints or network traces.
- Determine if this is a known behavior common to all NFS implementations ( Linux, ....etc ) and if so have your team determine if this is a problem that should be addressed in the spec and the implementations.
What you describe --- having different views of state on the client and server -- is not a known common behaviour. I have tried it on my Kerberos setup. Gotten a 5min ticket. As a user opened a file in a process that went to sleep. My user credentials have expired (after 5mins). I verified that by doing an "ls" on a mounted filesystem which resulted in permission denied error. Then I killed the application that had an opened file. This resulted in a NFS CLOSE being sent to the server using the machine's gss context (which is a default behaviour of the linux client regardless of whether or not user's credentials are valid). Basically as far as I can tell, a linux client can handle cleaning up state when user's credentials have expired.
Andy