Re: Zombie / Orphan open files
From: Olga Kornievskaia <hidden>
Date: 2023-01-31 19:31:42
On Tue, Jan 31, 2023 at 12:12 PM Andrew J. Romero [off-list ref] wrote:
quoted
-----Original Message----- From: Chuck Lever III <chuck.lever@oracle.com>quoted
On Jan 31, 2023, at 9:42 AM, Andrew J. Romero [off-list ref] wrote: In a large campus environment, usage of the relevant memory pool will eventually get so high that a server-side reboot will be needed.The above is sticking with me a bit. Rebooting the server should force clients to re-establish state. Are they not re-establishing open file state for users whose ticket has expired?quoted
I would think each client would re-establish state for those open files anyway, and the server would be in the same overcommitted state it was in before it rebooted.When the number of opens gets close to the limit which would result in a disruptive NFSv4 service interruption ( currently 128K open files is the limit), I do the reboot ( actually I transfer the affected NFS serving resource from one NAS cluster-node to the other NAS cluster node ... this based on experience is like a 99.9% "non-disruptive reboot" of the affected NFS serving resource ) Before the resource transfer there will be ~126K open files ( from the NAS perspective ) 0.1 seconds after the resource transfer there will be close to zero files open. Within a few seconds there will be ~2000 and within a few minutes there will be ~2100. During the rest of the day I only see a slow rise in the average number of opens to maybe 2200. ( my take is ~2100 files were "active opens" before and after the resource transfer , the rest of the 126K opens were zombies that the clients were no longer using ). In 4-6 months the number of opens from the NAS perspective will slowly creep back up to the limit.
What you are describing sounds like a bug in a system (be it client or server). There is state that the client thought it closed but the server still keeping that state.
quoted
We might not have an accurate root cause analysis yet, or I could be missing something. -- Chuck Lever