Re: [PATCH v3 2/4] nfsd: rework refcounting in filecache
From: Chuck Lever III <chuck.lever@oracle.com>
Date: 2022-11-01 13:58:25
On Oct 28, 2022, at 4:13 PM, Jeff Layton [off-list ref] wrote: On Fri, 2022-10-28 at 19:49 +0000, Chuck Lever III wrote:quoted
quoted
On Oct 28, 2022, at 2:57 PM, Jeff Layton [off-list ref] wrote:quoted
I'm still not sold on the idea of a synchronous flush in nfsd_file_free().I think that we need to call this there to ensure that writeback errors are handled. I worry that if try to do this piecemeal, we could end up missing errors when they fall off the LRU.quoted
That feels like a deadlock waiting to happen and quite difficult to reproduce because I/O there is rarely needed. It could help to put a might_sleep() in nfsd_file_fsync(), at least, but I would prefer not to drive I/O in that path at all.I don't quite grok the potential for a deadlock here. nfsd_file_free already has to deal with blocking activities due to it effective doing a close(). This is just another one. That's why nfsd_file_put has a might_sleep in it (to warn its callers). What's the deadlock scenario you envision?
I never answered this question. I'll say up front that I believe this problem exists in the current code base, so what follows is meant to document an existing issue rather than a problem with this patch series. The filecache sets up a shrinker callback. This callback uses the same or similar code paths as the filecache garbage collector. Dai has found that trying to fsync inside a shrinker callback will lead to deadlocks on some filesystems (notably I believe he was testing btrfs at the time). To address this, the filecache shrinker callback could avoid evicting nfsd_file items that are open for write. -- Chuck Lever