Re: [PATCH] NFS: Handle missing attributes in OPEN reply

From: Trond Myklebust <trondmy@kernel.org>
Date: 2023-01-04 16:07:59

On Wed, 2023-01-04 at 10:09 -0500, Olga Kornievskaia wrote:

On Tue, Jan 3, 2023 at 9:17 PM Trond Myklebust [off-list ref]
wrote:

quoted

On Tue, 2023-01-03 at 21:02 -0500, Olga Kornievskaia wrote:

quoted

On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust
[off-list ref]
wrote:

quoted

On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote:

quoted

If a NFSv4 OPEN reply reports that the file was successfully
opened
but
the subsequent GETATTR fails, Linux-NFS will attempt a stand-
alone
GETATTR request.  If that also fails, handling of the reply
is
aborted
with error -EAGAIN and the open is attempted again from the
start.

This leaves the server with an active state (because the OPEN
operation
succeeded) which the client doesn't know about.  If the open-
owner
(local user) did not have the file already open, this has
minimal
consequences for the client and only causes the server to
spend
resources on an open state that will never be used or
explicitly
closed.

If the open-owner DID already have the file open, then it
will
hold a
reference to the open-state for that file, but the seq-id in
the
state-id will now be out-of-sync with the server.  The server
will
have
incremented the seq-id, but the client will not have
noticed.  So
when
the client next attempts to access the file using that state
(READ,
WRITE, SETATTR), the attempt will fail with
NFS4ERR_OLD_STATEID.

The Linux-client assumes this error is due to a race and
simply
retries
on the basis that the local state-id information should have
been
updated by another thread.  This basis is invalid in this
case
and
the
result is an infinite loop attempting IO and getting
OLD_STATEID.

This has been observed with a NetApp Filer as the server
(ONTAP
9.8
p5,
using NFSv4.0).  The client is creating, writing, and
unlinking a
particular file from multiple clients (.bash_history).  If a
new
OPEN
from one client races with a REMOVE from another client while
the
first
client already has the file open, the Filer can report
success
for
the
OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in
the
OPEN
request.  This gets the seq-id out-of-sync and a subsequent
write
to
the
other open on the first client causes the infinite loop to
occur.

The reason that the client returns -EAGAIN is that it needs
to
find
the
inode so it can find the associated state to update the seq-
id,
but
the
inode lookup requires the file-id which is provided in the
GETATTR
reply.  Without the file-id normal inode lookup cannot be
used.

This patch changes the lookup so that when the file-id is not
available
the list of states owned by the open-owner is examined to
find
the
state
with the correct state-id (ignoring the seq-id part of the
state-
id).
If this is found it is used just as when a normal inode
lookup
finds
an
inode.  If it isn't found, -EAGAIN is returned as before.

This bug can be demonstrated by modifying the Linux NFS
server as
follows:

1/ The second time a file is opened, unlink it.  This
simulates
   a race with another client, without needing to have a
race:

    --- a/fs/nfsd/nfs4proc.c
    +++ b/fs/nfsd/nfs4proc.c
    @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp,
struct
nfsd4_compound_state *cstate,
        if (reclaim && !status)
                nn->somebody_reclaimed = true;
     out:
    +   if (!status && open->op_stateid.si_generation > 1) {
    +           printk("Opening gen %d\n", (int)open-

quoted

op_stateid.si_generation);

    +           vfs_unlink(mnt_user_ns(resfh->fh_export-

quoted

ex_path.mnt),

    +                      resfh->fh_dentry->d_parent-

quoted

d_inode,

    +                      resfh->fh_dentry, NULL);
    +   }
        if (open->op_filp) {
                fput(open->op_filp);
                open->op_filp = NULL;

2/ When a GETATTR op is attempted on an unlinked file, return
ESTALE

    @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst
*rqstp,
struct
nfsd4_compound_state *cstate,
        if (status)
                return status;

    +   if (cstate->current_fh.fh_dentry->d_inode->i_nlink ==
0)
{
    +           printk("Return Estale for unlinked file\n");
    +           return nfserr_stale;
    +   }
    +
        if (getattr->ga_bmval[1] &
NFSD_WRITEONLY_ATTRS_WORD1)
                return nfserr_inval;

Then mount the filesystem and

  Thread 1            Thread 2
  open a file
                      open the same file (will fail)
  write to that file

I use this shell fragment, using 'sleep' for synchronisation.
The use of /bin/echo ensures the write is flushed when
/bin/echo
closes
the fd on exit.

    (
        /bin/echo hello
        sleep 3
        /bin/echo there
    ) > /import/A/afile &
    sleep 3
    cat /import/A/afile

Probably when the OPEN succeeds, the GETATTR fails, and we
don't
already
have the state open, we should explicitly close the state.
Leaving
it
open could cause problems if, for example, the server revoked
it
and
signalled the client that there was a revoked state.  The
client
would
not be able to find the state that needed to be
relinquished.  I
haven't
attempted to implement this.


If the server starts to reply NFS4ERR_STALE to GETATTR
requests,
why do
we care about stateid values?

It is acceptable for the server to return ESTALE to the GETATTR
after
the processing the open (due to a REMOVE that comes in) and that
open
generating a valid stateid which client should care about when
there
are pre-existing opens. The server will keep the state of an
existing
opens valid even if the file is removed. Which is what's
happening,
the previous open is being used for IO but the stateid is updated
on
the server but not on the client.

quoted

Just mark the inode as stale and drop it
on the floor.

Why would that be correct? Any pre-existing opens should continue
operating, thus the inode can't be marked stale. We don't do it
now
(silly rename allows preexisting IO to continue).

quoted

If the server tries to declare the state as revoked, then it is
clearly
borken, so let NetApp fix their own bug.

The server does not declare the state revoked. The open succeeded
and
its state's seqid was updated but the client is using an old
stateid.
Server isn't at fault here.

If the client can't send a GETATTR, then it can't revalidate
attributes, do I/O, and it can't even close the file. So we're not
going to waste time and effort trying to support this, whether or
not
NetApp thinks it is a good idea.

That makes sense but I think the server should fail PUTFHs in the
compounds after the REMOVE was processed, not the other (GETATTR,
WRITE) operations, right?

In theory, the server is free to reply NFS4_OK to PUTFH, and then
NFS4ERR_STALE to the GETATTR, because a COMPOUND is not atomic w.r.t.
execution of the individual operations. i.e. if the file is deleted by
some other RPC call between the execution of the PUTFH and the GETATTR
in the COMPOUND, then this situation can happen.

However, my point is that once the server starts handing out
NFS4ERR_STALE errors, then the client can assume that the filehandle
being used is bad, because under RFC5661, Section 4.2.2, the server
MUST return NFS4ERR_STALE if it is presented with that filehandle
again.
IOW: the client should just toss the file associated with that
filehandle out of its inode caches, and forget any state associated
with that filehandle, because all future attempts to use it MUST result
in NFS4ERR_STALE.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help