Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export

[PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-27
[PATCH 01/11] VFS: show correct dev num in mountinfo · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · NeilBrown <hidden> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · Miklos Szeredi <miklos@szeredi.hu> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · NeilBrown <hidden> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · Miklos Szeredi <miklos@szeredi.hu> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · NeilBrown <hidden> · 2021-07-30
Re: [PATCH 01/11] VFS: show correct dev num in mountinfo · Miklos Szeredi <miklos@szeredi.hu> · 2021-07-30
A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Al Viro <viro@zeniv.linux.org.uk> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Amir Goldstein <amir73il@gmail.com> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Josef Bacik <josef@toxicpanda.com> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Qu Wenruo <hidden> · 2021-08-03
RE: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Frank Filz <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Martin Steigerwald <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · Patrick Goetz <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · NeilBrown <hidden> · 2021-08-02
Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. · J. Bruce Fields <hidden> · 2021-08-03
[PATCH 02/11] VFS: allow d_automount to create in-place bind-mount. · NeilBrown <hidden> · 2021-07-27
[PATCH 03/11] VFS: pass lookup_flags into follow_down() · NeilBrown <hidden> · 2021-07-27
[PATCH 04/11] VFS: export lookup_mnt() · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 04/11] VFS: export lookup_mnt() · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
Re: [PATCH 04/11] VFS: export lookup_mnt() · NeilBrown <hidden> · 2021-07-30
[PATCH 05/11] VFS: new function: mount_is_internal() · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 05/11] VFS: new function: mount_is_internal() · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-28
Re: [PATCH 05/11] VFS: new function: mount_is_internal() · NeilBrown <hidden> · 2021-07-28
Re: [PATCH 05/11] VFS: new function: mount_is_internal() · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
[PATCH 06/11] nfsd: include a vfsmount in struct svc_fh · NeilBrown <hidden> · 2021-07-27
[PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Amir Goldstein <amir73il@gmail.com> · 2021-07-28
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-29
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Amir Goldstein <amir73il@gmail.com> · 2021-07-29
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Miklos Szeredi <miklos@szeredi.hu> · 2021-08-06
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Amir Goldstein <amir73il@gmail.com> · 2021-08-06
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · Miklos Szeredi <miklos@szeredi.hu> · 2021-08-06
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · J. Bruce Fields <hidden> · 2021-07-28
Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-28
[PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on() · NeilBrown <hidden> · 2021-07-27
[PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-27
Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · J. Bruce Fields <hidden> · 2021-07-28
Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-28
Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points. · NeilBrown <hidden> · 2021-07-30
[PATCH 10/11] btrfs: introduce mapping function from location to inum · NeilBrown <hidden> · 2021-07-27
[PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · NeilBrown <hidden> · 2021-07-27
[RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static · kernel test robot <hidden> · 2021-07-28
Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · kernel test robot <hidden> · 2021-07-28
Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · Christian Brauner <hidden> · 2021-07-28
Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · NeilBrown <hidden> · 2021-07-29
Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots. · Christian Brauner <hidden> · 2021-07-29
[btrfs] 5874902268: xfstests.btrfs.202.fail · kernel test robot <hidden> · 2021-07-31
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Wang Yugui <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Wang Yugui <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Neal Gompa <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · J. Bruce Fields <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Amir Goldstein <amir73il@gmail.com> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Qu Wenruo <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · J. Bruce Fields <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Josef Bacik <josef@toxicpanda.com> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Forza <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Zygo Blaxell <hidden> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Amir Goldstein <amir73il@gmail.com> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Graham Cobb <hidden> · 2021-07-29
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Wang Yugui <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · J. Bruce Fields <hidden> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Josef Bacik <josef@toxicpanda.com> · 2021-07-28
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · Al Viro <viro@zeniv.linux.org.uk> · 2021-07-30
Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly · NeilBrown <hidden> · 2021-07-30
[PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-13
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Josef Bacik <josef@toxicpanda.com> · 2021-08-13
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Goffredo Baroncelli <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Roman Mamedov <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Goffredo Baroncelli <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Goffredo Baroncelli <hidden> · 2021-08-17
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-17
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Goffredo Baroncelli <hidden> · 2021-08-18
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-15
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Amir Goldstein <amir73il@gmail.com> · 2021-08-19
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-20
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Amir Goldstein <amir73il@gmail.com> · 2021-08-20
[PATCH v2] BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-23
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Wang Yugui <hidden> · 2021-08-18
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-18
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Zygo Blaxell <hidden> · 2021-08-19
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-20
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Zygo Blaxell <hidden> · 2021-08-22
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-23
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · NeilBrown <hidden> · 2021-08-23
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Zygo Blaxell <hidden> · 2021-08-25
Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export · Wang Yugui <hidden> · 2021-08-23

From: Amir Goldstein <amir73il@gmail.com>
Date: 2021-08-20 06:23:28
Also in: linux-fsdevel, linux-nfs

On Fri, Aug 20, 2021 at 6:22 AM NeilBrown [off-list ref] wrote:

On Thu, 19 Aug 2021, Amir Goldstein wrote:

quoted

On Mon, Aug 16, 2021 at 1:21 AM NeilBrown [off-list ref] wrote:

quoted

There are a few ways to handle this more gracefully.

1/ We could get btrfs to hand out new filehandles as well as new inode
numbers, but still accept the old filehandles.  Then we could make the
inode number reported be based on the filehandle.  This would be nearly
seamless but rather clumsy to code.  I'm not *very* keen on this idea,
but it is worth keeping in mind.

So objects would change their inode number after nfs inode cache is
evicted and while nfs filesystem is mounted. That does not sound ideal.

No.  Almost all filehandle lookups happen in the context of some other
filehandle.  If the provided context is an old-style filehandle, we
provide an old-style filehandle for the lookup.  There is already code
in nfsd to support this (as we have in the past changed how filesystems
are identified).

I see. This is nice!
It almost sounds like "inode mapped mounts" :-)

It would only be if the mountpoint filehandle (which is fetched without
that context) went out of cache that inode numbers would change.  That
would mean that the filesystem (possibly an automount) was unmounted.
When it was remounted it could have a different device number anyway, so
having different inode numbers would be of little consequence.

quoted

But I am a bit confused about the problem.
If the export is of the btrfs root, then nfs client cannot access any
subvolumes (right?) - that was the bug report, so the value of inode
numbers in non-root subvolumes is not an issue.

Not correct.  All objects in the filesystem are fully accessible.  The
only problem is that some pairs of objects have the same inode number.
This causes some programs like 'find' and 'du' to behave differently to
expectations.  They will refuse to even look in a subvolume, because it
looks like doing so could cause an infinite loop.  The values of inode
numbers in non-root subvolumes is EXACTLY the issue.

Thanks for clarifying.

quoted

If export is of non-root subvolume, then why bother changing anything
at all? Is there a need to traverse into sub-sub-volumes?

quoted

2/ We could add a btrfs mount option to control whether the uniquifier
was set or not.  This would allow the sysadmin to choose when to manage
any breakage.  I think this is my preference, but Josef has declared an
aversion to mount options.

3/ We could add a module parameter to nfsd to control whether the
uniquifier is merged in.  This again gives the sysadmin control, and it
can be done despite any aversion from btrfs maintainers.  But I'd need
to overcome any aversion from the nfsd maintainers, and I don't know how
strong that would be yet. (A new export option isn't really appropriate.
It is much more work to add an export option than the add a mount option).

That is too bad, because IMO from users POV, "fsid=btrfsroot" or "cross-subvol"
export option would have been a nice way to describe and opt-in to this new
functionality.

But let's consider for a moment the consequences of enabling this functionality
automatically whenever exporting a btrfs root volume without "crossmnt":

1. Objects inside a subvol that are inaccessible(?) with current
nfs/nfsd without
    "crossmnt" will become accessible after enabling the feature -
this will match
    the user experience of accessing btrfs on the host

Not correct - as above.

quoted

2. The inode numbers of the newly accessible objects would not match the inode
    numbers on the host fs (no big deal?)

Unlikely to be a problem.  Inode numbers have no meaning beyond the facts
that:
  - they are stable for the lifetime of the object
  - they are unique within a filesystem (except btrfs lies about
    filesystems)
  - they are not zero

The facts only need to be equally true on the NFS server and client..

quoted

3. The inode numbers of objects in a snapshot would not match the inode
    numbers of the original (pre-snapshot) objects (acceptable tradeoff for
    being able to access the snapshot objects without bloating /proc/mounts?)

This also should not be a problem.  Files in different snapshots are
different things that happen to share storage (like reflinks).
Comparing inode numbers between places which report different st_dev
does not fit within the meaning of inode numbers.

quoted

4. The inode numbers of objects in a subvol observed via this "cross-subvol"
    export would not match the inode numbers of the same objects observed
    via an individual subvol export

The device number would differ too, so the relative values of the inode
numbers would be irrelevant.

quoted

5. st_ino conflicts are possible when multiplexing subvol id and inode number.
    overlayfs resolved those conflicts by allocating an inode number from a
    reserved non-persistent inode range, which may cause objects to change
    their inode number during the lifetime on the filesystem (sensible
tradeoff?)

I think that #4 is a bit hard to swallow and #3 is borderline acceptable...
Both and quite hard to document and to set expectations as a non-opt-in
change of behavior when exporting btrfs root.

IMO, an nfsd module parameter will give some control and therefore is
a must, but it won't make life easier to document and set user expectations
when the semantics are not clearly stated in the exports table.

You claim that "A new export option isn't really appropriate."
but your only argument is that "It is much more work to add
an export option than the add a mount option".

With all due respect, for this particular challenge with all the
constraints involved, this sounds like a pretty weak argument.

Surely, adding an export option is easier than slowly changing all
userspace tools to understand subvolumes? a solution that you had
previously brought up.

Can you elaborate some more about your aversion to a new
export option.

Export options are bits in a 32bit word - so both user-space and kernel
need to agree on names for them.  We are currently using 18, so there is
room to grow.  It is a perfectly reasonable way to implement sensible
features.  It is, I think, a poor way to implement hacks to work around
misfeatures in filesystems.

This is the core of my dislike for adding an export option.  Using one
effectively admits that what btrfs is doing is a valid thing to do.  I
don't think it is.  I don't think we want any other filesystem developer
to think that they can emulate the behaviour because support is already
provided.

If we add any configuration to support btrfs, I would much prefer it to
be implemented in fs/btrfs, and if not, then with loud warnings that it
works around a deficiency in btrfs.
  /sys/modules/nfsd/parameters/btrfs_export_workaround

Thanks for clarifying.
I now understand how "hacky" the workaround in nfsd is.

Naive question: could this behavior be implemented in btrfs as you
prefer, but in a way that only serves nfsd and NEW nfs mounts for
that matter (i.e. only NEW filehandles)?

Meaning passing some hint in ->getattr() and ->iterate_shared(),
sort of like idmapped mount does for uid/gid.

Thanks,
Amir.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help