Re: [PATCH] NFS: Handle missing attributes in OPEN reply
From: Olga Kornievskaia <hidden>
Date: 2023-01-09 15:38:19
On Thu, Jan 5, 2023 at 10:48 PM NeilBrown [off-list ref] wrote:
On Fri, 06 Jan 2023, Trond Myklebust wrote:quoted
On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:quoted
On Wed, 04 Jan 2023, NeilBrown wrote:quoted
On Wed, 04 Jan 2023, Trond Myklebust wrote:quoted
On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:quoted
On Wed, 04 Jan 2023, Trond Myklebust wrote:quoted
If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do we care about stateid values? Just mark the inode as stale and drop it on the floor.We have a valid state from the server - we really shouldn't just ignore it. Maybe it would be OK to mark the inode stale. I don't know if various retry loops abort properly when the inode is stale.Yes, they are all supposed to do that. Otherwise we would end up looping forever in close(), for instance, since the PUTFH, GETATTR and CLOSE can all return NFS4ERR_STALE as well.To mark the inode as STALE we still need to find the inode, and that is the key bit missing in the current code. Once we find the inode, we could mark it stale, but maybe some other error resulted in the missing GETATTR... It might make sense to put the new code in _nfs4_proc_open() after the explicit nfs4_proc_getattr() fails. We would need to find the inode given only the filehandle. Is there any easy way to do that? Thanks, NeilBrownI couldn't see a consistent pattern to follow for when to mark an inode as stale. Do this, on top of the previous patch, seem reasonable? Thanks, NeilBrowndiff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index b441b1d14a50..04497cb42154 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c@@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(structnfs_server *server, case -ESTALE: if (inode != NULL && S_ISREG(inode->i_mode)) pnfs_destroy_layout(NFS_I(inode)); + if (inode) + nfs_set_inode_stale(inode);This is normally dealt with in the generic code inside nfs_revalidate_inode(). There should be no need to duplicate it here.quoted
break; case -NFS4ERR_DELEG_REVOKED: case -NFS4ERR_ADMIN_REVOKED:@@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(structnfs4_opendata *data, return status; } if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) { + struct inode *inode = nfs4_get_inode_by_stateid( + &data->o_res.stateid, + data->owner);There shouldn't be a need to go looking for open descriptors here, because they will hit ESTALE at some point later anyway.The problem is that they don't hit ESTALE later. Unless we update our stored stateid with the result of the OPEN, we can use the old stateid, and get the corresponding error. The only way to avoid the infinite loop is to either mark the inode as stale, or update the state-id. For either of those we need to find the inode. We don't have fileid so we cannot use iget. We do have file handle and state-id. Maybe you are saying this is a server bug that the client cannot be expect to cope with at all, and that an infinite loop is a valid client response to this particular server behaviour. In that case, I guess no patch is needed.
I'm not arguing that the server should do something else. But I would like to talk about it from the spec perspective. When PUTFH+WRITE is sent to the server (with an incorrect stateid) and it's processing the WRITE compound (as the spec doesn't require the server to validate a filehandle on PUTFH. The spec says PUTFH is to "set" the correct filehandle), the server is dealing with 2 errors that it can possibly return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing in the spec that speaks to the orders of errors to be returned. Of course I'd like to say that the server should prioritize ERR_STALE over ERR_OLD_STATEID (simply because it's a MUST in the spec and ERR_OLD_STATEIDs are written as "rules").
NeilBrownquoted
If we're going to change anything, I'd rather see us return -EACCES and -ESTALE from the decode_access() and decode_getfattr() calls in nfs4_xdr_dec_open() (and only those errors from those two calls!) so that we can skip the unnecessary getattr call here. In fact, the only case that this extra getattr should be trying to address is the one where the server returns NFS4ERR_DELAY to either the decode_access() or the decode_getfattr() calls specifically, and where we therefore don't want to replay the entire open call.quoted
nfs4_sequence_free_slot(&o_res->seq_res); - nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, NULL); + nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, inode); + iput(inode); } return 0; }@@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server*server, struct nfs_fh *fhandle, { struct nfs4_exception exception = { .interruptible = true, + .inode = inode, }; int err; do {-- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com