Re: [PATCH/RFC] NFSv3 handle out-of-order write replies
From: Jeff Layton <jlayton@kernel.org>
Date: 2023-01-26 21:39:08
On Fri, 2023-01-27 at 07:36 +1100, NeilBrown wrote:
On Thu, 26 Jan 2023, Jeff Layton wrote:quoted
On Wed, 2023-01-25 at 13:03 +1100, NeilBrown wrote:quoted
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 7931fa472561..8a83d6d204ed 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h@@ -190,6 +190,33 @@ struct nfs_inode { /* Open contexts for shared mmap writes */ struct list_head open_files; + /* Keep track of out-of-order replies. + * The ooo array contains start/end pairs of + * number from the changeid sequence when + * the inodes iversion has been updated. + * It also contains end/start pair (i.e. reverse order) + * of sections of the changeid sequence that have + * been seen in replies from the server. + * Normally these should match and when both + * A:B and B:A are found in ooo, they are both removed. + * And if a reply with A:B causes an iversion update + * of A:B, then neither are added. + * When a reply has pre_change that doesn't match + * iversion, then the changeid pair, and any consequent + * change in iversion ARE added. Later replies + * might fill in the gaps, or possibly a gap is caused + * by a change from another client. + * When a file or directory is opened, if the ooo table + * is not empty, then we assume the gaps were due to + * another client and we invalidate the cached data. + * + * We can only track a limited number of concurrent gaps. + * Currently that limit is 16. + */ + int ooo_cnt; + int ooo_max; // TRACING + unsigned long ooo[16][2];Why unsigned longs here? Shouldn't these be u64?Yes, they should be u64. Thanks.quoted
I guess you could argue that when we have 32-bit longs that the most significant bits don't matter, but that's also the case with 64-bit longs, and 32 bits would halve the space requirements. Also, this grows each inode by 2k on a 64-bit arch! Maybe we should dynamically allocate these things instead? If the allocation fails, then we could just go back to marking the cache invalid and move on.2K? 8*16*2+4*2 == 264, not 2048.
You're correct of course. I'm not sure where I got 2048.
But I agree that allocating on demand would make sense. 16 is probably more than needed. I don't have proper testing results from the customer yet, but I wouldn't be surprised if a smaller number would suffice. But if we are allocating only on demand, it wouldn't hurt to allocate 16.
Makes sense. Is there a max number of NFS writes you can have in flight to a single server in the client? That might inform how large an array you need.
-- Jeff Layton [off-list ref]