Re: NFSv4.0 Linux client fails to return delegation
From: dai.ngo@oracle.com
Date: 2023-06-20 15:27:22
On 6/19/23 12:19 PM, Trond Myklebust wrote:
Hi Dai, On Mon, 2023-06-19 at 10:02 -0700, dai.ngo@oracle.com wrote:quoted
Hi Trond, I'm testing the NFS server with write delegation support and the Linux client using NFSv4.0 and run into a situation that needs your advise. In this scenario, the NFS server grants the write delegation to the client. Later when the client returns delegation it sends the compound PUTFH, GETATTR and DELERETURN. When the NFS server services the GETATTR, it detects that there is a write delegation on this file but it can not detect that this GETATTR request was sent from the same client that owns the write delegation (due to the nature of NFSv4.0 compound). As the result, the server sends CB_RECALL to recall the delegation and replies NFS4ERR_DELAY to the GETATTR request. When the client receives the NFS4ERR_DELAY it retries with the same compound PUTFH, GETATTR, DELERETURN and server again replies the NFS4ERR_DELAY. This process repeats until the recall times out and the delegation is revoked by the server. I noticed that the current order of GETATTR and DELEGRETURN was done by commit e144cbcc251f. Then later on, commit 8ac2b42238f5 was added to drop the GETATTR if the request was rejected with EACCES. Do you have any advise on where, on server or client, this issue should be addressed?This wants to be addressed in the server. The client has a very good reason for wanting to retrieve the attributes before returning the delegation here: it needs to update the change attribute while it is still holding the delegation in order to ensure close-to-open cache consistency. Since you do have a stateid in the DELEGRETURN, it should be possible to determine that this is indeed the client that holds the delegation.
Thank you Trond. I'll wait for Chuck to decide what to do next. -Dai