Thread (57 messages) 57 messages, 11 authors, 2012-03-18

Re: getdents - ext4 vs btrfs performance

From: Jacek Luczak <hidden>
Date: 2012-03-02 10:05:56
Also in: linux-ext4, linux-fsdevel, lkml

2012/3/1 Chris Mason [off-list ref]:
On Wed, Feb 29, 2012 at 11:44:31PM -0500, Theodore Tso wrote:
quoted
You might try sorting the entries returned by readdir by inode numbe=
r before you stat them. =A0 =A0This is a long-standing weakness in ext3=
/ext4, and it has to do with how we added hashed tree indexes to direct=
ories in (a) a backwards compatible way, that (b) was POSIX compliant w=
ith respect to adding and removing directory entries concurrently with =
reading all of the directory entries using readdir.
quoted
You might try compiling spd_readdir from the e2fsprogs source tree (=
in the contrib directory):
quoted
http://git.kernel.org/?p=3Dfs/ext2/e2fsprogs.git;a=3Dblob;f=3Dcontri=
b/spd_readdir.c;h=3Df89832cd7146a6f5313162255f057c5a754a4b84;hb=3Dd9a5d=
37535794842358e1cfe4faa4a89804ed209
quoted
=85 and then using that as a LD_PRELOAD, and see how that changes th=
ings.
quoted
The short version is that we can't easily do this in the kernel sinc=
e it's a problem that primarily shows up with very big directories, and=
 using non-swappable kernel memory to store all of the directory entrie=
s and then sort them so they can be returned in inode number just isn't=
 practical. =A0 It is something which can be easily done in userspace, =
though, and a number of programs (including mutt for its Maildir suppor=
t) does do, and it helps greatly for workloads where you are calling re=
addir() followed by something that needs to access the inode (i.e., sta=
t, unlink, etc.)
quoted
For reading the files, the acp program I sent him tries to do somethi=
ng
similar. =A0I had forgotten about spd_readdir though, we should consi=
der
hacking that into cp and tar.

One interesting note is the page cache used to help here. =A0Picture =
two
tests:

A) time tar cf /dev/zero /home

and

cp -a /home /new_dir_in_new_fs
unmount / flush caches
B) time tar cf /dev/zero /new_dir_in_new_fs

On ext, The time for B used to be much faster than the time for A
because the files would get written back to disk in roughly htree ord=
er.
Based on Jacek's data, that isn't true anymore.
I've took both on tests. The subject is acp and spd_readdir used with
tar, all on ext4:
1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png
2) spd_readdir: http://91.234.146.107/~difrost/seekwatcher/tar_ext4_rea=
dir.png
3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png

The acp looks much better than spd_readdir but directory copy with
spd_readdir decreased to 52m 39sec (30 min less).

-Jacek
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help