Re: How capacious and well-indexed are ext4, xfs and btrfs directories?
From: Dave Chinner <david@fromorbit.com>
Date: 2021-05-17 23:22:45
Also in:
linux-btrfs, linux-fsdevel, linux-xfs
On Mon, May 17, 2021 at 04:06:58PM +0100, David Howells wrote:
Hi, With filesystems like ext4, xfs and btrfs, what are the limits on directory capacity, and how well are they indexed? The reason I ask is that inside of cachefiles, I insert fanout directories inside index directories to divide up the space for ext2 to cope with the limits on directory sizes and that it did linear searches (IIRC).
Don't do that for XFS. XFS directories have internal hashed btree indexes that are far more space efficient than using fanout in userspace. i.e. The XFS hash index uses 8 bytes per dirent, and so in a 4kB directory block size structure can index about 500 entries per block. And being O(log N) for lookup, insert and remove, the fan-out within the directory hash per IO operation is an aorder of magnitude higher than using directories in userspace.... The capacity limit for XFS is 32GB of dirent data, which generally equates to somewhere around 300-500 million dirents depending on filename size. The hash index is separate from this limit (has it's own 32GB address segment, as does the internal freespace map for the directory.... The other directory design characterisitic of XFs directories is that readdir is always a sequential read through the dirent data with built in readahead. It does not need to look up the hash index to determine where to read the next dirents from - that's a straight "file offset to physical location" lookup in the extent btree, which is always cached in memory. So that's generally not a limiting factor, either.
For some applications, I need to be able to cache over 1M entries (render farm) and even a kernel tree has over 100k.
Not a problem for XFS with a single directory, but could definitely
be a problem for others especially as the directory grows and
shrinks. Last I measured, ext4 directory perf drops off at about
80-90k entries using 40 byte file names, but you can get an idea of
XFS directory scalability with large entry counts in commit
756c6f0f7efe ("xfs: reverse search directory freespace indexes").
I'll reproduce the table using a 4kB directory block size here:
File count create time(sec) / rate (files/s)
10k 0.41 / 24.3k
20k 0.75 / 26.7k
100k 3.27 / 30.6k
200k 6.71 / 29.8k
1M 37.67 / 26.5k
2M 79.55 / 25.2k
10M 552.89 / 18.1k
So that's single threaded file create, which shows the rough limits
of insert into the large directory. There really isn't a major
drop-off in performance until there are several million entries in
the directory. Remove is roughly the same speed for the same dirent
count.
What I'd like to do is remove the fanout directories, so that for each logical "volume"[*] I have a single directory with all the files in it. But that means sticking massive amounts of entries into a single directory and hoping it (a) isn't too slow and (b) doesn't hit the capacity limit.
Note that if you use a single directory, you are effectively single threading modifications to your file index. You still need to use fanout directories if you want concurrency during modification for the cachefiles index, but that's a different design criteria compared to directory capacity and modification/lookup scalability. Cheers, Dave. -- Dave Chinner david@fromorbit.com