Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
From: NeilBrown <hidden>
Date: 2021-08-02 21:40:35
Also in:
linux-fsdevel, linux-nfs
On Mon, 02 Aug 2021, Martin Steigerwald wrote:
Hi Neil! Wow, this is a bit overwhelming for me. However, I got a very specific question for userspace developers in order to probably provide valuable input to the KDE Baloo desktop search developers: NeilBrown - 02.08.21, 06:18:29 CEST:quoted
The "obvious" choice for a replacement is the file handle provided by name_to_handle_at() (falling back to st_ino if name_to_handle_at isn't supported by the filesystem). This returns an extensible opaque byte-array. It is *already* more reliable than st_ino. Comparing st_ino is only a reliable way to check if two files are the same if you have both of them open. If you don't, then one of the files might have been deleted and the inode number reused for the other. A filehandle contains a generation number which protects against this. So I think we need to strongly encourage user-space to start using name_to_handle_at() whenever there is a need to test if two things are the same.How could that work for Baloo's use case to see whether a file it encounters is already in its database or whether it is a new file. Would Baloo compare the whole file handle or just certain fields or make a hash of the filehandle or what ever? Could you, in pseudo code or something, describe the approach you'd suggest. I'd then share it on:
Yes, the whole filehandle.
struct file_handle {
unsigned int handle_bytes; /* Size of f_handle [in, out] */
int handle_type; /* Handle type [out] */
unsigned char f_handle[0]; /* File identifier (sized by
caller) [out] */
};
i.e. compare handle_type, handle_bytes, and handle_bytes worth of
f_handle.
This file_handle is local to the filesytem. Two different filesystems
can use the same filehandle for different files. So the identity of the
filesystem need to be combined with the file_handle.
Bug 438434 - Baloo appears to be indexing twice the number of files than are actually in my home directory https://bugs.kde.org/438434
This bug wouldn't be address by using the filehandle. Using a filehandle allows you to compare two files within a single filesystem. This bug is about comparing two filesystems either side of a reboot, to see if they are the same. As has already been mentioned in that bug, statfs().f_fsid is the best solution (unless comparing the mount point is satisfactory). NeilBrown