Re: [PATCH] NFS: Handle missing attributes in OPEN reply

From: Olga Kornievskaia <hidden>
Date: 2023-01-09 15:38:19

On Thu, Jan 5, 2023 at 10:48 PM NeilBrown [off-list ref] wrote:

On Fri, 06 Jan 2023, Trond Myklebust wrote:

quoted

On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:

quoted

On Wed, 04 Jan 2023, NeilBrown wrote:

quoted

On Wed, 04 Jan 2023, Trond Myklebust wrote:

quoted

On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:

quoted

On Wed, 04 Jan 2023, Trond Myklebust wrote:

quoted


If the server starts to reply NFS4ERR_STALE to GETATTR
requests,
why do
we care about stateid values? Just mark the inode as stale
and drop
it
on the floor.

We have a valid state from the server - we really shouldn't
just
ignore
it.

Maybe it would be OK to mark the inode stale.  I don't know if
various
retry loops abort properly when the inode is stale.

Yes, they are all supposed to do that. Otherwise we would end up
looping forever in close(), for instance, since the PUTFH,
GETATTR and
CLOSE can all return NFS4ERR_STALE as well.

To mark the inode as STALE we still need to find the inode, and
that is
the key bit missing in the current code.  Once we find the inode,
we
could mark it stale, but maybe some other error resulted in the
missing
GETATTR...

It might make sense to put the new code in _nfs4_proc_open() after
the
explicit nfs4_proc_getattr() fails.  We would need to find the
inode
given only the filehandle.  Is there any easy way to do that?

Thanks,
NeilBrown

I couldn't see a consistent pattern to follow for when to mark an
inode
as stale.  Do this, on top of the previous patch, seem reasonable?

Thanks,
NeilBrown

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index b441b1d14a50..04497cb42154 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c

@@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct

nfs_server *server,
                case -ESTALE:
                        if (inode != NULL && S_ISREG(inode->i_mode))
                                pnfs_destroy_layout(NFS_I(inode));
+                       if (inode)
+                               nfs_set_inode_stale(inode);

This is normally dealt with in the generic code inside
nfs_revalidate_inode(). There should be no need to duplicate it here.

quoted

                        break;
                case -NFS4ERR_DELEG_REVOKED:
                case -NFS4ERR_ADMIN_REVOKED:

@@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct

nfs4_opendata *data,
                        return status;
        }
        if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
+               struct inode *inode = nfs4_get_inode_by_stateid(
+                       &data->o_res.stateid,
+                       data->owner);

There shouldn't be a need to go looking for open descriptors here,
because they will hit ESTALE at some point later anyway.

The problem is that they don't hit ESTALE later.  Unless we update our
stored stateid with the result of the OPEN, we can use the old stateid,
and get the corresponding error.

The only way to avoid the infinite loop is to either mark the inode as
stale, or update the state-id.  For either of those we need to find the
inode.  We don't have fileid so we cannot use iget.  We do have file
handle and state-id.

Maybe you are saying this is a server bug that the client cannot be
expect to cope with at all, and that an infinite loop is a valid client
response to this particular server behaviour.  In that case, I guess no
patch is needed.

I'm not arguing that the server should do something else. But I would
like to talk about it from the spec perspective. When PUTFH+WRITE is
sent to the server (with an incorrect stateid) and it's processing the
WRITE compound (as the spec doesn't require the server to validate a
filehandle on PUTFH. The spec says PUTFH is to "set" the correct
filehandle), the server is dealing with 2 errors that it can possibly
return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing
in the spec that speaks to the orders of errors to be returned. Of
course I'd like to say that the server should prioritize ERR_STALE
over ERR_OLD_STATEID (simply because it's a MUST in the spec and
ERR_OLD_STATEIDs are written as "rules").

NeilBrown

quoted

If we're going to change anything, I'd rather see us return -EACCES and
-ESTALE from the decode_access() and decode_getfattr() calls in
nfs4_xdr_dec_open() (and only those errors from those two calls!) so
that we can skip the unnecessary getattr call here.

In fact, the only case that this extra getattr should be trying to
address is the one where the server returns NFS4ERR_DELAY to either the
decode_access() or the decode_getfattr() calls specifically, and where
we therefore don't want to replay the entire open call.

quoted

                nfs4_sequence_free_slot(&o_res->seq_res);
-               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
NULL);
+               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
inode);
+               iput(inode);
        }
        return 0;
 }

@@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server

*server, struct nfs_fh *fhandle,
 {
        struct nfs4_exception exception = {
                .interruptible = true,
+               .inode = inode,
        };
        int err;
        do {

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help