Thread (20 messages) 20 messages, 9 authors, 2021-06-22

Re: How capacious and well-indexed are ext4, xfs and btrfs directories?

From: Andreas Dilger <hidden>
Date: 2021-05-21 05:13:36
Also in: linux-btrfs, linux-fsdevel, linux-xfs

On May 17, 2021, at 9:06 AM, David Howells [off-list ref] wrote:
With filesystems like ext4, xfs and btrfs, what are the limits on directory
capacity, and how well are they indexed?

The reason I ask is that inside of cachefiles, I insert fanout directories
inside index directories to divide up the space for ext2 to cope with the
limits on directory sizes and that it did linear searches (IIRC).

For some applications, I need to be able to cache over 1M entries (render
farm) and even a kernel tree has over 100k.

What I'd like to do is remove the fanout directories, so that for each logical
"volume"[*] I have a single directory with all the files in it.  But that
means sticking massive amounts of entries into a single directory and hoping
it (a) isn't too slow and (b) doesn't hit the capacity limit.
Ext4 can comfortably handle ~12M entries in a single directory, if the
filenames are not too long (e.g. 32 bytes or so).  With the "large_dir"
feature (since 4.13, but not enabled by default) a single directory can
hold around 4B entries, basically all the inodes of a filesystem.

There are performance knees as the index grows to a new level (~50k, 10M,
depending on filename length)

As described elsewhere in the thread, allowing concurrent create and unlink
in a directory (rename probably not needed) would be invaluable for scaling
multi-threaded workloads.  Neil Brown posted a prototype patch to add this
to the VFS for NFS:

https://lore.kernel.org/lustre-devel/8736rsbdx1.fsf@notabene.neil.brown.name/

Maybe it's time to restart that discussion?

Cheers, Andreas




Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help