Thread (122 messages) 122 messages, 21 authors, 2021-08-25

Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.

From: NeilBrown <hidden>
Date: 2021-08-02 21:40:35
Also in: linux-fsdevel, linux-nfs

On Mon, 02 Aug 2021, Martin Steigerwald wrote:
Hi Neil!

Wow, this is a bit overwhelming for me. However, I got a very specific 
question for userspace developers in order to probably provide valuable 
input to the KDE Baloo desktop search developers:

NeilBrown - 02.08.21, 06:18:29 CEST:
quoted
The "obvious" choice for a replacement is the file handle provided by
name_to_handle_at() (falling back to st_ino if name_to_handle_at isn't
supported by the filesystem).  This returns an extensible opaque
byte-array.  It is *already* more reliable than st_ino.  Comparing
st_ino is only a reliable way to check if two files are the same if
you have both of them open.  If you don't, then one of the files
might have been deleted and the inode number reused for the other.  A
filehandle contains a generation number which protects against this.

So I think we need to strongly encourage user-space to start using
name_to_handle_at() whenever there is a need to test if two things are
the same.
How could that work for Baloo's use case to see whether a file it 
encounters is already in its database or whether it is a new file.

Would Baloo compare the whole file handle or just certain fields or make a 
hash of the filehandle or what ever? Could you, in pseudo code or 
something, describe the approach you'd suggest. I'd then share it on:
Yes, the whole filehandle.

 struct file_handle {
        unsigned int handle_bytes; /* Size of f_handle [in, out] */
        int           handle_type;    /* Handle type [out] */
        unsigned char f_handle[0]; /* File identifier (sized by
                                     caller) [out] */
 };

i.e.  compare handle_type, handle_bytes, and handle_bytes worth of
f_handle.
This file_handle is local to the filesytem.  Two different filesystems
can use the same filehandle for different files.  So the identity of the
filesystem need to be combined with the file_handle.
Bug 438434 - Baloo appears to be indexing twice the number of files than 
are actually in my home directory

https://bugs.kde.org/438434
This bug wouldn't be address by using the filehandle.  Using a
filehandle allows you to compare two files within a single filesystem.
This bug is about comparing two filesystems either side of a reboot, to
see if they are the same.

As has already been mentioned in that bug, statfs().f_fsid is the best
solution (unless comparing the mount point is satisfactory).

NeilBrown
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help