Thread (122 messages) 122 messages, 21 authors, 2021-08-25

Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly

From: Al Viro <viro@zeniv.linux.org.uk>
Date: 2021-07-30 00:18:12
Also in: linux-fsdevel, linux-nfs

On Wed, Jul 28, 2021 at 05:30:04PM -0400, Josef Bacik wrote:
I don't think anybody has that many file systems.  For btrfs it's a single
file system.  Think of syncfs, it's going to walk through all of the super
blocks on the system calling ->sync_fs on each subvol superblock.  Now this
isn't a huge deal, we could just have some flag that says "I'm not real" or
even just have anonymous superblocks that don't get added to the global
super_blocks list, and that would address my main pain points.
Umm...  Aren't the snapshots read-only by definition?
The second part is inode reclaim.  Again this particular problem could be
avoided if we had an anonymous superblock that wasn't actually used, but the
inode lru is per superblock.  Now with reclaim instead of walking all the
inodes, you're walking a bunch of super blocks and then walking the list of
inodes within those super blocks.  You're burning CPU cycles because now
instead of getting big chunks of inodes to dispose, it's spread out across
many super blocks.

The other weird thing is the way we apply pressure to shrinker systems.  We
essentially say "try to evict X objects from your list", which means in this
case with lots of subvolumes we'd be evicting waaaaay more inodes than you
were before, likely impacting performance where you have workloads that have
lots of files open across many subvolumes (which is what FB does with it's
containers).

If we want a anonymous superblock per subvolume then the only way it'll work
is if it's not actually tied into anything, and we still use the primary
super block for the whole file system.  And if that's what we're going to do
what's the point of the super block exactly?  This approach that Neil's come
up with seems like a reasonable solution to me.  Christoph gets his
separation and /proc/self/mountinfo, and we avoid the scalability headache
of a billion super blocks.  Thanks,
AFAICS, we also get arseloads of weird corner cases - in particular, Neil's
suggestions re visibility in /proc/mounts look rather arbitrary.

Al, really disliking the entire series...
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help