Re: [PATCH 3/3] nfsd: simplify write verifier handling

From: Rick Macklem <hidden>
Date: 2023-02-14 22:57:54

On Tue, Feb 14, 2023 at 5:53 AM Jeff Layton [off-list ref] wrote:

On Mon, 2023-02-13 at 22:28 -0500, Trond Myklebust wrote:

quoted

On Mon, 2023-02-13 at 16:49 -0800, Rick Macklem wrote:

quoted

On Mon, Feb 13, 2023 at 1:14 PM Jeff Layton [off-list ref]
wrote:

quoted

CAUTION: This email originated from outside of the University of
Guelph. Do not click links or open attachments unless you recognize
the sender and know the content is safe. If in doubt, forward
suspicious emails to IThelp@uoguelph.ca


The write verifier exists to tell the client when the server may
have
forgotten some unstable writes. The typical way that this happens
is if
the server crashes, but we've also extended nfsd to change it when
there
are writeback errors as well.

The way it works today though, we call something like vfs_fsync
(e.g.
for a COMMIT call) and if we get back an error, we'll reset the
write
verifier.

This is non-optimal for a couple of reasons:

1/ There could be significant delay between an error being
recorded and the reset. It would be ideal if the write verifier
were to
change as soon as the error was recorded.

2/ It's a bit of a waste, in that if we get a writeback error on a
single inode, we'll end up resetting the write verifier for
everything,
even on inodes that may be fine (e.g. on a completely separate fs).

Here's the snippet from RFC8881:
   The final portion of the result is the field writeverf.  This
field
   is the write verifier and is a cookie that the client can use to
   determine whether a server has changed instance state (e.g.,
server
   restart) between a call to WRITE and a subsequent call to either
   WRITE or COMMIT.  This cookie MUST be unchanged during a single
   instance of the NFSv4.1 server and MUST be unique between
instances
   of the NFSv4.1 server.  If the cookie changes, then the client
MUST
   assume that any data written with an UNSTABLE4 value for committed
   and an old writeverf in the reply has been lost and will need to
be
   recovered.

I've always interpreted the writeverf as "per-server" and not  "per-
file".
Although I'll admit the above does not make that crystal clear, it
does make
it clear that the writeverf applies to a "server instance" and not a
file or
file system on the server.

The FreeBSD client assumes it is "per-server" and re-writes all
uncommitted
writes for the server, not just ones for the file (or file system)
the
writeverf is
replied with.  (I vaguely recall Solaris does the same?)

At the very least, I think you should run this past the IETF working
group
(nfsv4@ietf.org) to see what they say w.r.t. the writeverf being
"per-file" vs
"per-server".

As I recall, we've already had this discussion on the IETF NFSv4
working group mailing list:
https://mailarchive.ietf.org/arch/msg/nfsv4/99Ow2muMylXKWd9lzi9_BX2LJDY/


That's why I kept it a global in the first place.

Now note that RFC8881 does also clarify in Section 18.3.3 that:


   The server must vary the value of the write
   verifier at each server event or instantiation that may lead to a
   loss of uncommitted data.  Most commonly this occurs when the server
   is restarted; however, other events at the server may result in
   uncommitted data loss as well.

So I feel it is quite OK to use the verifier the way we do now in order
to signify that a fatal write error has occurred and that clients must
resend any data that was uncommitted.

Thanks, I missed that discussion. I think you guys have convinced me
that we have to keep this per-server. I won't bother starting a new
thread on it.

It's a pity. It would have been a lot more elegant as a per-inode thing!

If you think it is worth the effort, you could propose an extension to
4.2. Something like Write_plus, Commit_plus operations.

rick

Chuck, I think that means we'll just want to keep patch #1 in this
series?

Thanks,
--
Jeff Layton [off-list ref]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help