Re: [PATCH 3/3] nfsd: simplify write verifier handling
From: Rick Macklem <hidden>
Date: 2023-02-14 22:57:54
On Tue, Feb 14, 2023 at 5:53 AM Jeff Layton [off-list ref] wrote:
On Mon, 2023-02-13 at 22:28 -0500, Trond Myklebust wrote:quoted
On Mon, 2023-02-13 at 16:49 -0800, Rick Macklem wrote:quoted
On Mon, Feb 13, 2023 at 1:14 PM Jeff Layton [off-list ref] wrote:quoted
CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca The write verifier exists to tell the client when the server may have forgotten some unstable writes. The typical way that this happens is if the server crashes, but we've also extended nfsd to change it when there are writeback errors as well. The way it works today though, we call something like vfs_fsync (e.g. for a COMMIT call) and if we get back an error, we'll reset the write verifier. This is non-optimal for a couple of reasons: 1/ There could be significant delay between an error being recorded and the reset. It would be ideal if the write verifier were to change as soon as the error was recorded. 2/ It's a bit of a waste, in that if we get a writeback error on a single inode, we'll end up resetting the write verifier for everything, even on inodes that may be fine (e.g. on a completely separate fs).Here's the snippet from RFC8881: The final portion of the result is the field writeverf. This field is the write verifier and is a cookie that the client can use to determine whether a server has changed instance state (e.g., server restart) between a call to WRITE and a subsequent call to either WRITE or COMMIT. This cookie MUST be unchanged during a single instance of the NFSv4.1 server and MUST be unique between instances of the NFSv4.1 server. If the cookie changes, then the client MUST assume that any data written with an UNSTABLE4 value for committed and an old writeverf in the reply has been lost and will need to be recovered. I've always interpreted the writeverf as "per-server" and not "per- file". Although I'll admit the above does not make that crystal clear, it does make it clear that the writeverf applies to a "server instance" and not a file or file system on the server. The FreeBSD client assumes it is "per-server" and re-writes all uncommitted writes for the server, not just ones for the file (or file system) the writeverf is replied with. (I vaguely recall Solaris does the same?) At the very least, I think you should run this past the IETF working group (nfsv4@ietf.org) to see what they say w.r.t. the writeverf being "per-file" vs "per-server".As I recall, we've already had this discussion on the IETF NFSv4 working group mailing list: https://mailarchive.ietf.org/arch/msg/nfsv4/99Ow2muMylXKWd9lzi9_BX2LJDY/ That's why I kept it a global in the first place. Now note that RFC8881 does also clarify in Section 18.3.3 that: The server must vary the value of the write verifier at each server event or instantiation that may lead to a loss of uncommitted data. Most commonly this occurs when the server is restarted; however, other events at the server may result in uncommitted data loss as well. So I feel it is quite OK to use the verifier the way we do now in order to signify that a fatal write error has occurred and that clients must resend any data that was uncommitted.Thanks, I missed that discussion. I think you guys have convinced me that we have to keep this per-server. I won't bother starting a new thread on it. It's a pity. It would have been a lot more elegant as a per-inode thing!
If you think it is worth the effort, you could propose an extension to 4.2. Something like Write_plus, Commit_plus operations. rick
Chuck, I think that means we'll just want to keep patch #1 in this series? Thanks, -- Jeff Layton [off-list ref]