Thread (32 messages) 32 messages, 5 authors, 2023-01-18

Re: [PATCH v5 3/5] nfsd: rework refcounting in filecache

From: Jeff Layton <jlayton@kernel.org>
Date: 2022-11-01 22:42:38

On Wed, 2022-11-02 at 09:05 +1100, NeilBrown wrote:
On Wed, 02 Nov 2022, Jeff Layton wrote:
quoted
On Wed, 2022-11-02 at 08:23 +1100, NeilBrown wrote:
quoted
On Wed, 02 Nov 2022, Jeff Layton wrote:
quoted
The filecache refcounting is a bit non-standard for something searchable
by RCU, in that we maintain a sentinel reference while it's hashed. This
in turn requires that we have to do things differently in the "put"
depending on whether its hashed, which we believe to have led to races.

There are other problems in here too. nfsd_file_close_inode_sync can end
up freeing an nfsd_file while there are still outstanding references to
it, and there are a number of subtle ToC/ToU races.

Rework the code so that the refcount is what drives the lifecycle. When
the refcount goes to zero, then unhash and rcu free the object.

With this change, the LRU carries a reference. Take special care to
deal with it when removing an entry from the list.
The refcounting and lru management all look sane here.

You need to have moved the final put (and corresponding fsync) to
different threads.  I think I see you and Chuck discussing that and I
have no sense of what is "right". 
Yeah, this is a tough call. I get Chuck's reticence.

One thing we could consider is offloading the SYNC_NONE writeback
submission to a workqueue. I'm not sure though whether that's a win --
it might just add needless context switches. OTOH, that would make it
fairly simple to kick off writeback when the REFERENCED flag is cleared,
which would probably be the best time to do it.

An entry that ends up being harvested by the LRU scanner is going to be
touched by it at least twice: once to clear the REFERENCED flag, and
again ~2s later to reap it.

If we schedule writeback when we clear the flag then we have a pretty
good indication that nothing else is going to be using it (though I
think we need to clear REFERENCED even when nfsd_file_check_writeback
returns true -- I'll fix that in the coming series).

In any case, I'd probably like to do that sort of change in a separate
series after we get the first part sorted.
quoted
But it would be nice to explain in
the comment what is being moved and why, so I could then confirm that
the code matches the intent.
I'm happy to add comments, but I'm a little unclear on what you're
confused by here. It's a bit too big of a patch for me to give a full
play-by-play description. Can you elaborate on what you'd like to see?
I don't need blow-by-blow, but all the behavioural changes should at
least be flagged in the intro, and possibly explained.
The one I particularly noticed is in nfsd_file_close_inode() which
previously used nfsd_file_dispose_list() which hands the final close off
to nfsd_filecache_wq.
But this patch now does the final close in-line so an fsnotify event
might now do the fsync.  I was assuming that was deliberate and wanted
it to be explained.  But maybe it wasn't deliberate?
Good catch! That wasn't a deliberate change, or at least I missed the
subtlety that the earlier code attempted to avoid it. fsnotify callbacks
are run under the srcu_read_lock. I don't think we want to run a fsync
under that if we can at all help it.

What we can probably do is unhash it and dequeue it from the LRU, and
then do a refcount_dec_and_test. If that comes back true, we can then
queue it to the nfsd_fcache_disposal infrastructure to be closed and
freed. I'll have a look at that tomorrow.
The movement of flush_delayed_fput() threw me at first, but I think I
understand it now - the new code for close_inode_sync is much cleaner,
not needing dispose_list_sync.
Yep, I think this is cleaner too.

-- 
Jeff Layton [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help