Thread (26 messages) 26 messages, 3 authors, 2022-11-01

Re: [PATCH v3 2/4] nfsd: rework refcounting in filecache

From: Chuck Lever III <chuck.lever@oracle.com>
Date: 2022-11-01 13:58:25

On Oct 28, 2022, at 4:13 PM, Jeff Layton [off-list ref] wrote:

On Fri, 2022-10-28 at 19:49 +0000, Chuck Lever III wrote:
quoted
quoted
On Oct 28, 2022, at 2:57 PM, Jeff Layton [off-list ref] wrote:
quoted
I'm still not sold on the idea of a synchronous flush in nfsd_file_free().
I think that we need to call this there to ensure that writeback errors
are handled. I worry that if try to do this piecemeal, we could end up
missing errors when they fall off the LRU.
quoted
That feels like a deadlock waiting to happen and quite difficult to
reproduce because I/O there is rarely needed. It could help to put a
might_sleep() in nfsd_file_fsync(), at least, but I would prefer not to
drive I/O in that path at all.
I don't quite grok the potential for a deadlock here. nfsd_file_free
already has to deal with blocking activities due to it effective doing a
close(). This is just another one. That's why nfsd_file_put has a
might_sleep in it (to warn its callers).

What's the deadlock scenario you envision?
I never answered this question.

I'll say up front that I believe this problem exists in the current code
base, so what follows is meant to document an existing issue rather than
a problem with this patch series.

The filecache sets up a shrinker callback. This callback uses the same
or similar code paths as the filecache garbage collector.

Dai has found that trying to fsync inside a shrinker callback will
lead to deadlocks on some filesystems (notably I believe he was testing
btrfs at the time).

To address this, the filecache shrinker callback could avoid evicting
nfsd_file items that are open for write.


--
Chuck Lever


Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help