Re: [PATCH] NFS: Handle missing attributes in OPEN reply

From: Olga Kornievskaia <hidden>
Date: 2023-01-09 16:08:24

On Mon, Jan 9, 2023 at 10:47 AM Trond Myklebust [off-list ref] wrote:

quoted

On Jan 9, 2023, at 10:33, Olga Kornievskaia [off-list ref] wrote:

On Thu, Jan 5, 2023 at 10:48 PM NeilBrown [off-list ref] wrote:

quoted

On Fri, 06 Jan 2023, Trond Myklebust wrote:

quoted

On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:

quoted

On Wed, 04 Jan 2023, NeilBrown wrote:

quoted

On Wed, 04 Jan 2023, Trond Myklebust wrote:

quoted

On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:

quoted

On Wed, 04 Jan 2023, Trond Myklebust wrote:

quoted


If the server starts to reply NFS4ERR_STALE to GETATTR
requests,
why do
we care about stateid values? Just mark the inode as stale
and drop
it
on the floor.

We have a valid state from the server - we really shouldn't
just
ignore
it.

Maybe it would be OK to mark the inode stale.  I don't know if
various
retry loops abort properly when the inode is stale.

Yes, they are all supposed to do that. Otherwise we would end up
looping forever in close(), for instance, since the PUTFH,
GETATTR and
CLOSE can all return NFS4ERR_STALE as well.

To mark the inode as STALE we still need to find the inode, and
that is
the key bit missing in the current code.  Once we find the inode,
we
could mark it stale, but maybe some other error resulted in the
missing
GETATTR...

It might make sense to put the new code in _nfs4_proc_open() after
the
explicit nfs4_proc_getattr() fails.  We would need to find the
inode
given only the filehandle.  Is there any easy way to do that?

Thanks,
NeilBrown

I couldn't see a consistent pattern to follow for when to mark an
inode
as stale.  Do this, on top of the previous patch, seem reasonable?

Thanks,
NeilBrown

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index b441b1d14a50..04497cb42154 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c

@@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct

nfs_server *server,
               case -ESTALE:
                       if (inode != NULL && S_ISREG(inode->i_mode))
                               pnfs_destroy_layout(NFS_I(inode));
+                       if (inode)
+                               nfs_set_inode_stale(inode);

This is normally dealt with in the generic code inside
nfs_revalidate_inode(). There should be no need to duplicate it here.

quoted

                       break;
               case -NFS4ERR_DELEG_REVOKED:
               case -NFS4ERR_ADMIN_REVOKED:

@@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct

nfs4_opendata *data,
                       return status;
       }
       if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
+               struct inode *inode = nfs4_get_inode_by_stateid(
+                       &data->o_res.stateid,
+                       data->owner);

There shouldn't be a need to go looking for open descriptors here,
because they will hit ESTALE at some point later anyway.

The problem is that they don't hit ESTALE later.  Unless we update our
stored stateid with the result of the OPEN, we can use the old stateid,
and get the corresponding error.

The only way to avoid the infinite loop is to either mark the inode as
stale, or update the state-id.  For either of those we need to find the
inode.  We don't have fileid so we cannot use iget.  We do have file
handle and state-id.

Maybe you are saying this is a server bug that the client cannot be
expect to cope with at all, and that an infinite loop is a valid client
response to this particular server behaviour.  In that case, I guess no
patch is needed.

I'm not arguing that the server should do something else. But I would
like to talk about it from the spec perspective. When PUTFH+WRITE is
sent to the server (with an incorrect stateid) and it's processing the
WRITE compound (as the spec doesn't require the server to validate a
filehandle on PUTFH. The spec says PUTFH is to "set" the correct
filehandle), the server is dealing with 2 errors that it can possibly
return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing
in the spec that speaks to the orders of errors to be returned. Of
course I'd like to say that the server should prioritize ERR_STALE
over ERR_OLD_STATEID (simply because it's a MUST in the spec and
ERR_OLD_STATEIDs are written as "rules").

I disagree for the reason already pointed to in the spec. There is nothing in the spec that appears to allow the PUTFH to return anything other than NFS4ERR_STALE after the file has been deleted (and yes, RFC5661, Section 15.2 does list NFS4ERR_STALE as an error for PUTFH). PUTFH is definitely required to validate the file handle, since it is the ONLY operation that can return NFS4ERR_BADHANDLE.

We are talking about 4.0 and not 4.1. In 4.0 all operations that use
PUTFH can return ERR_BADHANDLE.

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help