Thread (45 messages) 45 messages, 6 authors, 2023-03-02

Re: Zombie / Orphan open files

From: Olga Kornievskaia <hidden>
Date: 2023-01-31 19:08:48

On Tue, Jan 31, 2023 at 1:35 PM Andrew J. Romero [off-list ref] wrote:

quoted
That's not the way state recovery works. Clients will reopen only
the files that are still in use. If the clients don't open the
"zombie" files again, then I'm fairly certain the applications
have already closed those files.
Hi

In the case of my test script , I know that the files were not
closed explicitly or on script termination.
How do you know that the files were not closed on the script
termination? One way to see what the OS might be doing for you is to
grab either a set of tracepoints or a network trace. A client would
have sent the close but it was for some reason rejected by the server?
( script terminated
without credentials ) .   By the time my session re-acquired credentials
( intentionally after process termination) , the process was already terminated
and nothing, on the client, would ever attempt to clean-up the
server-side "zombie open files"

The server-side pool usage caused by my intentionally
bad test script was not freed up until I did the cluster resource migration.
Once you did a migration event (which is how storage can recover from
having unrecoverable state btw), if the client (ie., the kernel, not
the script) "truly" didn't close files, then the kernel would have
recovered the open state again. However, I suspect that a resource
migration event helps to get out of a bad state. Which means, the
client (ie, kernel) did try to close the file but failed to do so
(lack of creds as you say) and since the kernel won't try to recover
from the lack of creds forever, it might give up on doing the close.
Yet, on the server side that state would remain. And something like a
migration event (which is non-disruptive to the client) is a way to
get out of such situations.
Question:
When a simple app (for example a python script ) on the NFS client
simply opens a text file,  is a lease automatically, behind the scenes,
created on the server. If so, is the server responsible for doing this:
If the lease isn't renewed every N minutes, close the file.

By "simply opens" a text file, I mean that:   the script contains no
code to request or in any way explicitly use locks



Thanks
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help