Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly

[PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-27
[PATCH 01/11] VFS: show correct dev num in mountinfo · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · NeilBrown <hidden> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · Miklos Szeredi <miklos@szeredi.hu> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · NeilBrown <hidden> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · Miklos Szeredi <miklos@szeredi.hu> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · NeilBrown <hidden> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · Miklos Szeredi <miklos@szeredi.hu> · 2021-07-30
A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Al Viro <viro@zeniv.linux.org.uk> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Amir Goldstein <amir73il@gmail.com> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Josef Bacik <josef@toxicpanda.com> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Qu Wenruo <hidden> · 2021-08-03
RE: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Frank Filz <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Martin Steigerwald <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Patrick Goetz <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-03
[PATCH 02/11] VFS: allow d_automount to create in-place bind-mount. · NeilBrown <hidden> · 2021-07-27
[PATCH 03/11] VFS: pass lookup_flags into follow_down() · NeilBrown <hidden> · 2021-07-27
[PATCH 04/11] VFS: export lookup_mnt() · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 04/11] VFS: export lookup_mnt() · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
Re: [PATCH 04/11] VFS: export lookup_mnt() · NeilBrown <hidden> · 2021-07-30
[PATCH 05/11] VFS: new function: mount_is_internal() · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 05/11] VFS: new function: mount_is_internal() · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-28
Re: [PATCH 05/11] VFS: new function: mount_is_internal() · NeilBrown <hidden> · 2021-07-28
Re: [PATCH 05/11] VFS: new function: mount_is_internal() · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
[PATCH 06/11] nfsd: include a vfsmount in struct svc_fh · NeilBrown <hidden> · 2021-07-27
[PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Amir Goldstein <amir73il@gmail.com> · 2021-07-28
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-29
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Amir Goldstein <amir73il@gmail.com> · 2021-07-29
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Miklos Szeredi <miklos@szeredi.hu> · 2021-08-06
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Amir Goldstein <amir73il@gmail.com> · 2021-08-06
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Miklos Szeredi <miklos@szeredi.hu> · 2021-08-06
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · J. Bruce Fields <hidden> · 2021-07-28
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-28
[PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on() · NeilBrown <hidden> · 2021-07-27
[PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · J. Bruce Fields <hidden> · 2021-07-28
Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-28
Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-30
[PATCH 10/11] btrfs: introduce mapping function from location to inum · NeilBrown <hidden> · 2021-07-27
[PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · NeilBrown <hidden> · 2021-07-27
[RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static · kernel test robot <hidden> · 2021-07-28
Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · kernel test robot <hidden> · 2021-07-28
Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · Christian Brauner <hidden> · 2021-07-28
Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · NeilBrown <hidden> · 2021-07-29
Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · Christian Brauner <hidden> · 2021-07-29
[btrfs] 5874902268: xfstests.btrfs.202.fail · kernel test robot <hidden> · 2021-07-31
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Wang Yugui <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Wang Yugui <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Neal Gompa <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · J. Bruce Fields <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Amir Goldstein <amir73il@gmail.com> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · J. Bruce Fields <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Josef Bacik <josef@toxicpanda.com> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Forza <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Amir Goldstein <amir73il@gmail.com> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Graham Cobb <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Wang Yugui <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · J. Bruce Fields <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Josef Bacik <josef@toxicpanda.com> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
[PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-13
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Josef Bacik <josef@toxicpanda.com> · 2021-08-13
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Goffredo Baroncelli <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Roman Mamedov <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Goffredo Baroncelli <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Goffredo Baroncelli <hidden> · 2021-08-17
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-17
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Goffredo Baroncelli <hidden> · 2021-08-18
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Amir Goldstein <amir73il@gmail.com> · 2021-08-19
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-20
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Amir Goldstein <amir73il@gmail.com> · 2021-08-20
[PATCH v2] BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-23
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Wang Yugui <hidden> · 2021-08-18
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-18
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Zygo Blaxell <hidden> · 2021-08-19
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-20
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Zygo Blaxell <hidden> · 2021-08-22
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-23
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-23
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Zygo Blaxell <hidden> · 2021-08-25
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Wang Yugui <hidden> · 2021-08-23

From: Zygo Blaxell <hidden>
Date: 2021-07-30 18:15:16
Also in: linux-fsdevel, linux-nfs

On Fri, Jul 30, 2021 at 03:09:12PM +0800, Qu Wenruo wrote:


On 2021/7/30 下午2:53, NeilBrown wrote:

quoted

On Fri, 30 Jul 2021, Qu Wenruo wrote:

quoted

You mean like "du -x"?? Yes.  You would lose the misleading illusion
that there are multiple filesystems.  That is one user-expectation that
would need to be addressed before people opt-in

OK, forgot it's an opt-in feature, then it's less an impact.

The hope would have to be that everyone would eventually opt-in once all
issues were understood.

quoted

Really not familiar with NFS/VFS, thus some ideas from me may sounds
super crazy.

Is it possible that, for nfsd to detect such "subvolume" concept by its
own, like checking st_dev and the fsid returned from statfs().

Then if nfsd find some boundary which has different st_dev, but the same
fsid as its parent, then it knows it's a "subvolume"-like concept.

Then do some local inode number mapping inside nfsd?
Like use the highest 20 bits for different subvolumes, while the
remaining 44 bits for real inode numbers.

Of-course, this is still a workaround...

Yes, it would certainly be possible to add some hacks to nfsd to fix the
immediate problem, and we could probably even created some well-defined
interfaces into btrfs to extract the required information so that it
wasn't too hackish.

Maybe that is what we will have to do.  But I'd rather not hack NFSD
while there is any chance that a more complete solution will be found.

I'm not quite ready to give up on the idea of squeezing all btrfs inodes
into a 64bit number space.  24bits of subvol and 40 bits of inode?
Make the split a mkfs or mount option?

Btrfs used to have a subvolume number limit in the past, for different
reasons.

In that case, subvolume number is limited to 48 bits, which is still too
large to avoid conflicts.

For inode number there is really no limit except the 256 ~ (U64)-256 limit.

Considering all these numbers are almost U64, conflicts would be
unavoidable AFAIK.

quoted

Maybe hand out inode numbers to subvols in 2^32 chunks so each subvol
(which has ever been accessed) has a mapping from the top 32 bits of the
objectid to the top 32 bits of the inode number.

We don't need something that is theoretically perfect (that's not
possible anyway as we don't have 64bits of device numbers).  We just
need something that is practical and scales adequately.  If you have
petabytes of storage, it is reasonable to spend a gigabyte of memory on
a lookup table(?).

Can such squishing-all-inodes-into-one-namespace work to be done in a
more generic way? e.g, let each fs with "subvolume"-like feature to
provide the interface to do that.

If you know the highest subvol ID number, you can pack two integers into
one larger integer by reversing the bits of the subvol number and ORing
them with the inode number, i.e. 0x0080000000000300 is subvol 256
inode 768.

The subvol ID's grow left to right while the inode numbers grow right
to left.  You can have billions of inodes in a few subvols, or billions of
subvols with a few inodes each, and neither will collide with the other
until there are billions of both.

If the filesystem tracks the number of bits in the highest subvol ID
and the highest inode number, then the inode numbers can be decoded,
and collisions can be detected.  e.g. if the maximum subvol ID on the
filesystem is below 131072, it will fit in 17 bits, then we know bits
63-47 are the subvol ID and bits 46-0 are the inode..  When subvol 131072
is created, the number of subvol bits increases to 18, but if every inode
fits in less than 46 bits, we know that every existing inode has a 0 in
the 18th subvol ID bit of the inode number, so there is no ambiguity.

If you don't know the maximum subvol ID, you can guess based on the
position of the large run of zero bits in the middle of the integer--not
reliable, but good enough for a guess if you were looking at 'ls -li'
output (and wrote the inode numbers in hex).

In the pathological case (the maximum subvol ID and maximum inode number
require more than 64 total bits) we return ENOSPC.

This can all be done when btrfs fills in an inode struct.  There's no need
to change the on-disk format, other than to track the highest inode and
subvol number.  btrfs can compute the maxima in reasonable but non-zero
time by searching trees on mount, so an incompatible disk format change
would only be needed to avoid making mount slower.

Despite that I still hope to have a way to distinguish the "subvolume"
boundary.

Packing the bits into a single uint64 doesn't help with this--it does
the opposite.  Subvol boundaries become harder to see without deliberate
checking (i.e. not the traditional parent.st_dev != child.st_dev test).

Judging from previous btrfs-related complaints, some users do want
"stealth" subvols whose boundaries are not accidentally visible, so the
new behavior could be a feature for someone.

If completely inside btrfs, it's pretty simple to locate a subvolume
boundary.
All subvolume have the same inode number 256.

Maybe we could reserve some special "squished" inode number to indicate
boundary inside a filesystem.

E.g. reserve (u64)-1 as a special indicator for subvolume boundaries.
As most fs would have reserved super high inode numbers anyway.

quoted

If we can make inode numbers unique, we can possibly leave the st_dev
changing at subvols so that "du -x" works as currently expected.

One thought I had was to use a strong hash to combine the subvol object
id and the inode object id into a 64bit number.  What is the chance of
a collision in practice :-)

But with just 64bits, conflicts will happen anyway...

The collision rate might be low enough that we could just skip over the
colliding numbers, but we'd have to have some kind of in-memory collision
map to avoid slowing down inode creation (currently the next inode number
is more or less "++last_inode_number", and looking up inodes to see if
they exist first would slow down new file creation a lot).

Thanks,
Qu

quoted

Thanks,
NeilBrown

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help