Thread (20 messages) 20 messages, 4 authors, 2023-01-09

Re: [PATCH] NFS: Handle missing attributes in OPEN reply

From: NeilBrown <hidden>
Date: 2023-01-05 22:02:50

On Thu, 05 Jan 2023, Olga Kornievskaia wrote:
On Tue, Jan 3, 2023 at 9:34 PM NeilBrown [off-list ref] wrote:
quoted
On Wed, 04 Jan 2023, NeilBrown wrote:
quoted
On Wed, 04 Jan 2023, Olga Kornievskaia wrote:
quoted
On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust [off-list ref] wrote:
quoted

If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
we care about stateid values?
It is acceptable for the server to return ESTALE to the GETATTR after
the processing the open (due to a REMOVE that comes in) and that open
generating a valid stateid which client should care about when there
are pre-existing opens. The server will keep the state of an existing
opens valid even if the file is removed. Which is what's happening,
the previous open is being used for IO but the stateid is updated on
the server but not on the client.
I agree that it is acceptable to return ESTALE to the GETATTR, but
having done that I don't think it is acceptable for a PUTFH of the same
filehandle to succeed.  Certainly any attempt to again use the
filehandle after the PUTFH should fail with NFS4ERR_STALE.

RFC7530 says

13.1.2.7.  NFS4ERR_STALE (Error Code 70)

   The current or saved filehandle value designating an argument to the
   current operation is invalid.  The file system object referred to by
   that filehandle no longer exists, or access to it has been revoked.

So the file doesn't exist or access has been revoked.  So any writes
should fail.  Failing with OLD_STATEID is weird - and having writes
succeed if we use the correct stateid is also odd.  Failing with STALE
would be perfectly sensible and I suspect the Linux client would handle
that just fine.
I checked a recent tcpdump (with patched SLE kernel talking to Netapp)
and I see that the writes don't succeed after the first NFS4ERR_STALE.

If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE.
If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID.

So it seems that it just has these two checks in the wrong order.
Does Netapp return ERR_STALE on the PUTFH if the stateid is correct?
In the traces I have, Netapp never returns ERR_STALE on PUTFH.  Of
course the PUTFH operation doesn't have a stateid, so the "if the
stateid is correct" is not meaningful.

ACCESS, READ, WRITE, SETATTR, and GETATTR are the only ops that I have
seen to result in ERR_STALE.

ACCESS and GETATTR don't have a stateid.
READ, WRITE, and SETATTR do.  These get OLD_STATEID if the stateid is
old, and only get STALE if the stateid is current.

If it's the WRITE operation that returns an error and if the server
has multiple errors (ie bad statid and bad file handle)` then I don't
think the spec says which error is more important. In this case, I
think the server should fail PUTFH with ERR_STALE.
I agree.  If the PUTFH returned STALE there would not be a problem.
Even if a race resulted in the PUTFH succeeding and the WRITE getting
OLD_STATEID, Linux-NFS would retry an the new PUTFH could then fail
correctly.

NeilBrown
quoted
NeilBrown
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help