Thread (121 messages) 121 messages, 20 authors, 2021-08-25

Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.

From: NeilBrown <hidden>
Date: 2021-08-02 21:25:08
Also in: linux-btrfs, linux-fsdevel

On Mon, 02 Aug 2021, Amir Goldstein wrote:
On Mon, Aug 2, 2021 at 8:41 AM NeilBrown [off-list ref] wrote:
quoted
On Mon, 02 Aug 2021, Al Viro wrote:
quoted
On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
quoted
It think we need to bite-the-bullet and decide that 64bits is not
enough, and in fact no number of bits will ever be enough.  overlayfs
makes this clear.
Sure - let's go for broke and use XML.  Oh, wait - it's 8 months too
early...
quoted
So I think we need to strongly encourage user-space to start using
name_to_handle_at() whenever there is a need to test if two things are
the same.
... and forgetting the inconvenient facts, such as that two different
fhandles may correspond to the same object.
Can they?  They certainly can if the "connectable" flag is passed.
name_to_handle_at() cannot set that flag.
nfsd can, so using name_to_handle_at() on an NFS filesystem isn't quite
perfect.  However it is the best that can be done over NFS.

Or is there some other situation where two different filehandles can be
reported for the same inode?

Do you have a better suggestion?
Neil,

I think the plan of "changing the world" is not very realistic.
I disagree.  It has happened before, it will happen again.  The only
difference about my proposal is that I'm suggesting the change be
proactive rather than reactive.
Sure, *some* tools can be changed, but all of them?
We only need to change the tools that notice there is a problem.  So it
is important to minimize the effect on existing tools, even when we
cannot reduce it to zero.  We then fix things that are likely to see a
problem, or that actually do.  And we clearly document the behaviour and
how to deal with it, for code that we cannot directly affect.

Remember: there is ALREADY breakage that has been fixed.  btrfs does
*not* behave like a "normal" filesystem.  Nor does NFS.  Multiple tools
have been adjusted to work with them.  Let's not pretend that will never
happen again, but instead use the dynamic to drive evolution in the way
we choose.
I went back to read your initial cover letter to understand the
problem and what I mostly found there was that the view of
/proc/x/mountinfo was hiding information that is important for
some tools to understand what is going on with btrfs subvols.
That was where I started, but not where I ended.  There are *lots* of
places that currently report inconsistent information for btrfs subvols.
Well I am not a UNIX history expert, but I suppose that
/proc/PID/mountinfo was created because /proc/mounts and
/proc/PID/mounts no longer provided tool with all the information
about Linux mounts.

Maybe it's time for a new interface to query the more advanced
sb/mount topology? fsinfo() maybe? With mount2 compatible API for
traversing mounts that is not limited to reporting all entries inside
a single page. I suppose we could go for some hierarchical view
under /proc/PID/mounttree. I don't know - new API is hard.
Yes, exactly - but not just for mounts.  Yes, we need new APIs (Because
the old ones have been broken in various ways).  That is exactly what
I'm proposing.  But "fixing" mountinfo turns out to be little more than
rearranging deck-chairs on the Titanic.
In any case, instead of changing st_dev and st_ino or changing the
world to work with file handles, why not add inode generation (and
maybe subvol id) to statx().
The enormous benefit of filehandles is that they are supported by
kernels running today.  As others have commented, they also work over
NFS.
But I would be quite happy to see more information made available
through statx - providing the meaning of that information was clearly
specified - both what can be assumed about it and what cannot.

Thanks,
NeilBrown

filesystem that care enough will provide this information and tools that
care enough will use it.

Thanks,
Amir.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help