Thread (45 messages) 45 messages, 6 authors, 2023-03-02

Re: Zombie / Orphan open files

From: Olga Kornievskaia <hidden>
Date: 2023-01-31 22:14:20

On Tue, Jan 31, 2023 at 2:55 PM Andrew J. Romero [off-list ref] wrote:

quoted
What you are describing sounds like a bug in a system (be it client or
server). There is state that the client thought it closed but the
server still keeping that state.
Hi Olga

Based on my simple test script experiment,
Here's a summary of what I believe is happening

1. An interactive user starts a process that opens a file or multiple files

2. A disruption, that prevents
   NFS-client <-> NFS-server communication,
   occurs while the file is open.  This could be due to
   having the file open a long time or due to opening the file
   too close to the time of disruption.

( I believe the most common "disruption" is
  credential expiration )

3) The user's process terminates before the disruption
     is cleared.  ( or stated another way ,  the disruption is not cleared until after the user
    process terminates )

   At the time the user process terminates, the process
   can not tell the server to close the server-side file state.

  After the process terminates, nothing will ever tell the server
  to close the files.  The now zombie open files will continue to
  consume server-side resources.

  In environments with many users, the problem is significant

My reasons for posting:

- Are not to have your team  help troubleshoot my specific issue
   ( that would be quite rude )

they are:

- Determine If my NAS vendor might be accidentally
  not doing something they should be.
  (  I now don't really think this is the case. )
It's hard to say who's at fault here without having some more info
like tracepoints or network traces.
- Determine if this is a known behavior common to all NFS implementations
   ( Linux, ....etc ) and if so have your team determine if this is a problem that should be addressed
   in the spec and the implementations.
What you describe  --- having different views of state on the client
and server -- is not a known common behaviour.

I have tried it on my Kerberos setup.
Gotten a 5min ticket.
As a user opened a file in a process that went to sleep.
My user credentials have expired (after 5mins). I verified that by
doing an "ls" on a mounted filesystem which resulted in permission
denied error.
Then I killed the application that had an opened file. This resulted
in a NFS CLOSE being sent to the server using the machine's gss
context (which is a default behaviour of the linux client regardless
of whether or not user's credentials are valid).

Basically as far as I can tell, a linux client can handle cleaning up
state when user's credentials have expired.


Andy



Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help