Re: simple NFSv4.1/4.2 test of remove while holding a delegation
From: Dai Ngo <dai.ngo@oracle.com>
Date: 2025-06-10 11:58:40
On 6/9/25 6:06 PM, Rick Macklem wrote:
On Mon, Jun 9, 2025 at 5:17 PM Dai Ngo [off-list ref] wrote:quoted
On 6/9/25 4:35 PM, Rick Macklem wrote:quoted
Hi, I hope you don't mind a cross-post, but I thought both groups might find this interesting... I have been creating a compound RPC that does REMOVE and then tries to determine if the file object has been removed and I was surprised to see quite different results from the Linux knfsd and Solaris 11.4 NFSv4.1/4.2 servers. I think both these servers provide FH4_PERSISTENT file handles, although I suppose I should check that? First, the test OPEN/CREATEs a regular file called "foo" (only one hard link) and acquires a write delegation for it. Then a compound does the following: ... REMOVE foo PUTFH fh for foo GETATTR For the Solaris 11.4 server, the server CB_RECALLs the delegation and then replies NFS4ERR_STALE for the PUTFH above. (The FreeBSD server currently does the same.) For a fairly recent Linux (6.12) knfsd, the above replies NFS_OK with nlinks == 0 in the GETATTR reply. Hmm. So I've looked in RFC8881 (I'm terrible at reading it so I probably missed something) and I cannot find anything that states either of the above behaviours is incorrect. (NFS4ERR_STALE is listed as an error code for PUTFH, but the description of PUTFH only says that it sets the CFH to the fh arg. It does not say anything w.r.t. the fh arg. needing to be for a file that still exists.) Neither of these servers sets OPEN4_RESULT_PRESERVE_UNLINKED in the OPEN reply. So, it looks like "file object no longer exists" is indicated either by a NFS4ERR_STALE reply to either PUTFH or GETATTR OR by a successful reply, but with nlinks == 0 for the GETATTR reply. To be honest, I kinda like the Linux knfsd version, but I am wondering if others think that both of these replies is correct? Also, is the CB_RECALL needed when the delegation is held by the same client as the one doing the REMOVE?The Linux NFSD detects the delegation belongs to the same client that causes the conflict (due to REMOVE) and skips the CB_RECALL. This is an optimization based on the assumption that the client would handle the conflict locally.And then what does the server do with the delegation? - Does it just discard it, since the file object has been deleted? OR - Does it guarantee that a DELEGRETURN done after the REMOVE will still work (which seems to be the case for the 6.12 server I am using for testing).
The delegation remains valid but the file was removed from the namespace. This is why the PUTFH and GETATTR in your test did not fail. However, any lookup of the file will fail.
quoted
If the REMOVE was done by another client, the REMOVE will not complete until the delegation is returned. If the PUTFH comes after the REMOVE was completed, it'll fail with NFS4ERR_STALE since the file, specified by the file handle, no longer exists.Assuming the statement w.r.t. "fail with NFS4ERR_STALE" only applies to "REMOVE done by another client" then that sounds fine.
Correction: even if the REMOVE was done by the another client and the delegation was recalled from the 1st client, the open stateid of the file remains valid until the client sends the CLOSE. So the PUTFH won't fail regardless which client sends the REMOVE.
However if the "fail with NFS4ERR_STALE is supposed for happen after REMOVE for same client" then that is not what I am seeing. If you are curious, the packet trace is here. (Look at packet#58). https://urldefense.com/v3/__https://people.freebsd.org/*rmacklem/linux-remove.pcap__;fg!!ACWV5N9M2RV99hQ!IEcffaAAeLhuzaJUO5rQOv0jUUk4ltuMpfqT83lLFkRL9cqOZEvZ-8GGjvoqlVAQKi_FAAhsKEl5NjvS0OLJ$ Btw, in case you are curious why I am doing this testing, I am trying to figure out a good way for the FreeBSD client to handle temporary files. Typically on POSIX they are done via the syscalls: fd = open("foo", O_CREATE ...); unlink("foo"); write(fd,..), write(fd,..)... read(fd,...), read(fd,...)... close(fd); If this happens quickly and is not too much writing, the writes copy data into buffers/pages, the reads read the data out of the pages and then it all gets deleted. Unfortunately, the CB_RECALL forces the NFSv4.n client to do WRITE, WRITE,..COMMIT and then DELEGRETURN. Then the REMOVE throws all the data away on the NFSv4.n server. --> As such, I really like not doing the CB_RECALL for "same client". My concern is "what happens to the delegation after the file object ("foo") gets deleted? It either needs to be thrown away by the NFSv4.n server or the PUTFH, DELEGRETURN needs to work after the REMOVE.
The PUTFH and DELEGRETURN continue to work after the REMOVE. The open stateid and delegation stateid on the server are destroyed only after the client sends the DELEGRETURN and CLOSE.
Otherwise, the NFSv4.n server may get constipated by the delegations,
which might be called stale, since the file object has been deleted.
--> I can do PUTFH, GETATTR after REMOVE in the same compound,
to find out if the file object has been deleted. But then, if a
PUTFH, DELEGRETURN fails with NFS4ERR_STALE, can I get
away with saying "the server should just discard the delegation as
the client already has done so??.You can try your test but I believe the PUTFH and GETATTR won't fail after the REMOVE. -Dai
Thanks for your comments, rickquoted
-Daiquoted
(I don't think it is, but there is a discussion in 18.25.4 which says "When the determination above cannot be made definitively because delegations are being held, they MUST be recalled.." but everything above that is a may/MAY, so it is not obvious to me if a server really needs to case?) Any comments? Thanks, rick ps: I am amazed when I learn these things about NFSv4.n after all these years.