Thread (122 messages) 122 messages, 21 authors, 2021-08-25

Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.

From: J. Bruce Fields <hidden>
Date: 2021-08-02 22:14:40
Also in: linux-fsdevel, linux-nfs

On Tue, Aug 03, 2021 at 07:59:30AM +1000, NeilBrown wrote:
On Tue, 03 Aug 2021, J. Bruce Fields wrote:
quoted
On Tue, Aug 03, 2021 at 07:10:44AM +1000, NeilBrown wrote:
quoted
On Mon, 02 Aug 2021, J. Bruce Fields wrote:
quoted
On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
quoted
For btrfs, the "location" is root.objectid ++ file.objectid.  I think
the inode should become (file.objectid ^ swab64(root.objectid)).  This
will provide numbers that are unique until you get very large subvols,
and very many subvols.
If you snapshot a filesystem, I'd expect, at least by default, that
inodes in the snapshot to stay the same as in the snapshotted
filesystem.
As I said: we need to challenge and revise user-space (and meat-space)
expectations. 
The example that came to mind is people that export a snapshot, then
replace it with an updated snapshot, and expect that to be transparent
to clients.

Our client will error out with ESTALE if it notices an inode number
changed out from under it.
Will it?
See fs/nfs/inode.c:nfs_check_inode_attributes():

	if (nfsi->fileid != fattr->fileid) {
                /* Is this perhaps the mounted-on fileid? */
                if ((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) &&
                    nfsi->fileid == fattr->mounted_on_fileid)
                        return 0;
                return -ESTALE;
        }

--b.
If the inode number changed, then the filehandle would change.
Unless the filesystem were exported with subtreecheck, the old filehandle
would continue to work (unless the old snapshot was deleted).  File-name
lookups from the root would find new files...

"replace with an updated snapshot" is no different from "replace with an
updated directory tree".  If you delete the old tree, then
currently-open files will break.  If you don't you get a reasonably
clean transition.
quoted
I don't know if there are other such cases.  It seems like surprising
behavior to me, though.
If you refuse to risk breaking anything, then you cannot make progress.
Providing people can choose when things break, and have advanced
warning, they often cope remarkable well.

Thanks,
NeilBrown

quoted
--b.
quoted
In btrfs, you DO NOT snapshot a FILESYSTEM.  Rather, you effectively
create a 'reflink' for a subtree (only works on subtrees that have been
correctly created with the poorly named "btrfs subvolume" command).

As with any reflink, the original has the same inode number that it did
before, the new version has a different inode number (though in current
BTRFS, half of the inode number is hidden from user-space, so it looks
like the inode number hasn't changed).
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help