Re: [PATCH/RFC] NFSv3 handle out-of-order write replies

From: Jeff Layton <jlayton@kernel.org>
Date: 2023-01-26 21:39:08

On Fri, 2023-01-27 at 07:36 +1100, NeilBrown wrote:

On Thu, 26 Jan 2023, Jeff Layton wrote:

quoted

On Wed, 2023-01-25 at 13:03 +1100, NeilBrown wrote:

quoted

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 7931fa472561..8a83d6d204ed 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h

@@ -190,6 +190,33 @@ struct nfs_inode {
 	/* Open contexts for shared mmap writes */
 	struct list_head	open_files;
 
+	/* Keep track of out-of-order replies.
+	 * The ooo array contains start/end pairs of
+	 * number from the changeid sequence when
+	 * the inodes iversion has been updated.
+	 * It also contains end/start pair (i.e. reverse order)
+	 * of sections of the changeid sequence that have
+	 * been seen in replies from the server.
+	 * Normally these should match and when both
+	 * A:B and B:A are found in ooo, they are both removed.
+	 * And if a reply with A:B causes an iversion update
+	 * of A:B, then neither are added.
+	 * When a reply has pre_change that doesn't match
+	 * iversion, then the changeid pair, and any consequent
+	 * change in iversion ARE added.  Later replies
+	 * might fill in the gaps, or possibly a gap is caused
+	 * by a change from another client.
+	 * When a file or directory is opened, if the ooo table
+	 * is not empty, then we assume the gaps were due to
+	 * another client and we invalidate the cached data.
+	 *
+	 * We can only track a limited number of concurrent gaps.
+	 * Currently that limit is 16.
+	 */
+	int ooo_cnt;
+	int ooo_max; // TRACING
+	unsigned long ooo[16][2];

Why unsigned longs here? Shouldn't these be u64?

Yes, they should be u64.  Thanks.

quoted

I guess you could argue that when we have 32-bit longs that the most
significant bits don't matter, but that's also the case with 64-bit
longs, and 32 bits would halve the space requirements.

Also, this grows each inode by 2k on a 64-bit arch! Maybe we should
dynamically allocate these things instead? If the allocation fails, then
we could just go back to marking the cache invalid and move on.

2K? 8*16*2+4*2 == 264, not 2048.

You're correct of course. I'm not sure where I got 2048.

But I agree that allocating on demand would make sense.  16 is probably
more than needed.  I don't have proper testing results from the customer
yet, but I wouldn't be surprised if a smaller number would suffice.  But
if we are allocating only on demand, it wouldn't hurt to allocate 16.

Makes sense. Is there a max number of NFS writes you can have in flight
to a single server in the client? That might inform how large an array
you need.

-- 
Jeff Layton [off-list ref]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help