Re: [Linux-cachefs] [BUG] fscache writing but not reading
From: David Wysochanski <hidden>
Date: 2023-05-24 16:14:54
Subsystem:
filesystems (vfs and infrastructure), nfs, sunrpc, and lockd clients, the rest · Maintainers:
Alexander Viro, Christian Brauner, Trond Myklebust, Anna Schumaker, Linus Torvalds
On Fri, May 19, 2023 at 7:53 AM David Wysochanski [off-list ref] wrote:
On Thu, May 18, 2023 at 4:21 PM Chris Chilvers [off-list ref] wrote:quoted
On Tue, 16 May 2023 at 20:28, David Wysochanski [off-list ref] wrote:quoted
On Tue, May 16, 2023 at 11:42 AM Chris Chilvers [off-list ref] wrote:quoted
While testing the fscache performance fixes [1] that were merged into 6.4-rc1 it appears that the caching no longer works. The client will write to the cache but never reads.Thanks for the report. If you reboot do you see reads from the cache?On the first read after a reboot it uses the cache, but subsequent reads do not use the cache.quoted
You can check if the cache is being read from by looking in /proc/fs/fscache/stats at the "IO" line: # grep IO /proc/fs/fscache/stats IO : rd=80030 wr=0Running the tests 4 times (twice before reboot, and twice after) give the following metrics: FS-Cache I/O (delta) Run rd wr 1 0 39,250 2 130 38,894 3 39,000 0 4 72 38,991quoted
Can you share: 1. NFS server you're using (is it localhost or something else) 2. NFS versionThe NFS server and client are separate VMs on the same network. Server NFS version: Ubuntu 22.04 jammy, kernel 5.15.0-1021-gcp Client NFS version: Ubuntu 22.04 jammy, kernel 6.4.0-060400rc1-generic (https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.4-rc1/) Client nfs-utils: 2.6.3-rc6 Client cachefilesd: 0.10.10-0.2ubuntu1quoted
In addition to checking the above for the reads from the cache, you can also see whether NFS reads are going over the wire pretty easily with a similar technique. Copy /proc/self/mounstats to a file before your test, then make a second copy after the test, then run mountstats as follows: mountstats -S /tmp/mountstats.1 -f /tmp/mountstats.2app read = applications read bytes via read(2) client read = client read bytes via NFS READ Run app read client read 1 322,122,547,200 322,122,547,200 2 322,122,547,200 321,048,805,376 3 322,122,547,200 0 4 322,122,547,200 321,593,053,184 I've put the full data in a GitHub gist, along with a graph collected from a metrics agent: https://gist.github.com/chilversc/54eb76155ad37b66cb85186e7449beaa https://gist.githubusercontent.com/chilversc/54eb76155ad37b66cb85186e7449beaa/raw/09828c596d0cfc44bc0eb67f40e4033db202326e/metrics.pngThanks Chris for all this info. I see you're using NFSv3 so I'll focus on that, and review all this info for clues. I also have been working on some updated test cases and see some very unusual behavior like you're reporting. I also confirmed that adding the two patches for "Issue #1" onto 6.4-rc1 resolve _most_ of the caching issues. However, even after those patches, in some limited instances, there are still NFS reads over the wire when there should only be reads from the cache. There may be multiple bugs here.
I actually misspoke regarding "multiple bugs", as I forgot to add a small NFS hunk (see below) needed to dhowells 2nd patch (v6 of mm, netfs, fscache: Stop read optimisation when folio removed from pagecache). After the below small hunk was added on top of dhowells 2nd patch, all my tests pass. I've also enhanced my existing tests to check NFS READs, fscache READs, and fscache WRITEs as expected. And I added an additional test to create files the size of RAM, read them multiple times, and check for the ops are as expected. So I'm confident if we add dhowells 2 patches, plus the below hunk for NFS, these caching issues will be resolved.
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 8c35d88a84b1..d4a20748b14f 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c@@ -180,6 +180,10 @@ void nfs_fscache_init_inode(struct inode *inode) &auxdata, /* aux_data */ sizeof(auxdata), i_size_read(inode)); + + if (netfs_inode(inode)->cache) + mapping_set_release_always(inode->i_mapping); + } /*