Re: [PATCH v4] EXT4: optimizing group search for inode allocation
From: Andreas Dilger <hidden>
Date: 2016-01-11 21:13:53
On Jan 6, 2016, at 5:46 PM, Lokesh Jaliminche [off-list ref] wrote:
you are right flex groups cannot be completely empty as some blocks are used for inode table and allocation bitmaps. I missed that point. I have corrected that. Also I have tested and validated this patch by putting some traces it works!
Good news. Thanks for updating the patch and testing it. More comments below.
quoted hunk ↗ jump to hunk
Here are corresponding logs: commands: ========= mkfs.ext4 -g 10000 -G 2 /dev/sdc mount -t ext4 /dev/sdc /mnt/test_ext4/ cd /mnt/test_ext4 mkdir 1 cd 1 logs: ===== Jan 7 03:36:45 root kernel: [ 64.792939] max_usable_blocks_per_flex_group = [19692] Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_clusters = [19264] Jan 7 03:36:45 root kernel: [ 64.792939] Inodes_per_flex_group = [4864] Jan 7 03:36:45 root kernel: [ 64.792939] stats.free_inodes = [4853] Jan 7 03:36:45 root kernel: [ 64.792950] max_usable_blocks_per_flex_group = [19692] Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_clusters = [19481] Jan 7 03:36:45 root kernel: [ 64.792950] Inodes_per_flex_group = [4864] Jan 7 03:36:45 root kernel: [ 64.792950] stats.free_inodes = [4864] Jan 7 03:36:45 root kernel: [ 64.792958] max_usable_blocks_per_flex_group = [19692] Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_clusters = [19481] Jan 7 03:36:45 root kernel: [ 64.792958] Inodes_per_flex_group = [4864] Jan 7 03:36:45 root kernel: [ 64.792958] stats.free_inodes = [4864] Jan 7 03:36:45 root kernel: [ 64.792965] max_usable_blocks_per_flex_group = [19692] Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_clusters = [19481] Jan 7 03:36:45 root kernel: [ 64.792965] Inodes_per_flex_group = [4864] Jan 7 03:36:45 root kernel: [ 64.792965] stats.free_inodes = [4864] Jan 7 03:36:45 root kernel: [ 64.792972] max_usable_blocks_per_flex_group = [19692] Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_clusters = [19481] Jan 7 03:36:45 root kernel: [ 64.792972] Inodes_per_flex_group = [4864] Jan 7 03:36:45 root kernel: [ 64.792972] stats.free_inodes = [4864] Jan 7 03:36:45 root kernel: [ 64.792979] max_usable_blocks_per_flex_group = [19692]<<<<<<<<<<< Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_clusters = [19692]<<<<<<<<<<< Jan 7 03:36:45 root kernel: [ 64.792979] Inodes_per_flex_group = [4864]<<<<<<<<<<<< Jan 7 03:36:45 root kernel: [ 64.792979] stats.free_inodes = [4864]<<<<<<<<<<<< Jan 7 03:36:45 root kernel: [ 64.792985] found_flex_bg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>condition satisfied [Patch v4]: ========== From: Lokesh Nagappa Jaliminche <redacted> Date: Thu, 7 Jan 2016 05:12:20 +0530 Subject: [PATCH] ext4: optimizing group serch for inode allocation Added a check at the start of group search loop to avoid looping unecessarily in case of empty group. This also allow group search to jump directly to "found_flex_bg" with "stats" and "group" already set, so there is no need to go through the extra steps of setting "best_desc" , "best_group" and then break out of the loop just to set "stats" and "group" again. Signed-off-by: Lokesh Nagappa Jaliminche <redacted> --- fs/ext4/ext4.h | 2 ++ fs/ext4/ialloc.c | 21 +++++++++++++++++++-- 2 files changed, 21 insertions(+), 2 deletions(-)diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index cc7ca4e..fc48f9a 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h@@ -326,6 +326,8 @@ struct flex_groups {#define EXT4_MAX_DESC_SIZE EXT4_MIN_BLOCK_SIZE #define EXT4_DESC_SIZE(s) (EXT4_SB(s)->s_desc_size) #ifdef __KERNEL__ +# define EXT4_INODE_TABLE_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_itb_per_group) +# define EXT4_BLOCKS_PER_CLUSTER(s) (EXT4_SB(s)->s_cluster_ratio)
I don't think there is much value to these macros. They aren't much shorter than the actual variable access, and they don't provide any improved clarity for the reader. I think they were introduced many years ago so that this header could be shared between the kernel and e2fsprogs, but that is no longer true so I don't think there is a need to add new ones. Ted, any opinion on this?
quoted hunk ↗ jump to hunk
# define EXT4_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_blocks_per_group) # define EXT4_CLUSTERS_PER_GROUP(s) (EXT4_SB(s)->s_clusters_per_group) # define EXT4_DESC_PER_BLOCK(s) (EXT4_SB(s)->s_desc_per_block)diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 1b8024d..dc66ad8 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c@@ -463,7 +463,6 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent, sbi->s_log_groups_per_flex; parent_group >>= sbi->s_log_groups_per_flex; } - freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter); avefreei = freei / ngroups;
Not sure why this line is being deleted?
quoted hunk ↗ jump to hunk
@@ -477,7 +476,20 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent, (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) { int best_ndir = inodes_per_group; int ret = -1; - + /* blocks used for inode table and allocation bitmaps */ + unsigned int max_usable_blocks_per_flex_group;
This could be shorter and still clear, like "max_blocks_per_flex".
+ unsigned int blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb); + unsigned int inode_table_blocks_per_group = \ + EXT4_INODE_TABLE_BLOCKS_PER_GROUP(sb); + /* Number of blocks per cluster */ + unsigned int cluster_ratio = EXT4_BLOCKS_PER_CLUSTER(sb);
There also isn't much value to making local variables that hold these same values again. It increases stack space for little benefit. They are only accessed once anyway.
+ unsigned int inodes_per_flex_group = inodes_per_group * \ + flex_size; + max_usable_blocks_per_flex_group = \ + ((blocks_per_group*flex_size) - \
Spaces around that '*'.
+ (inode_table_blocks_per_group * \ + flex_size + 2 * flex_size)) / \
This is C and not a CPP macro, so no need for linefeed escapes.
+ cluster_ratio;
This should be ">> EXT4_SB(sb)->s_cluster_bits" to avoid the divide.
quoted hunk ↗ jump to hunk
if (qstr) { hinfo.hash_version = DX_HASH_HALF_MD4; hinfo.seed = sbi->s_hash_seed;@@ -489,6 +501,11 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent, for (i = 0; i < ngroups; i++) { g = (parent_group + i) % ngroups; get_orlov_stats(sb, g, flex_size, &stats); + /* can't get better group than empty group */ + if (max_usable_blocks_per_flex_group == \ + stats.free_clusters && \
No need for linefeed escapes.
+ inodes_per_flex_group == stats.free_inodes)
Align continued line after '(' from the "if (" line.
+ goto found_flex_bg;
This is indented too much. It will work properly when you fix the previous line.
if (!stats.free_inodes) continue; if (stats.used_dirs >= best_ndir) -- 1.7.1 Regards, Lokesh On Sun, Jan 03, 2016 at 11:34:51AM -0700, Andreas Dilger wrote:quoted
On Dec 26, 2015, at 01:41, Lokesh Jaliminche [off-list ref] wrote:quoted
From 5199982d29ff291b181c25af63a22d6f2d6d3f7b Mon Sep 17 00:00:00 2001 From: Lokesh Nagappa Jaliminche <redacted> Date: Sat, 26 Dec 2015 13:19:36 +0530 Subject: [PATCH v3] ext4: optimizing group search for inode allocation Added a check at the start of group search loop to avoid looping unecessarily in case of empty group. This also allow group search to jump directly to "found_flex_bg" with "stats" and "group" already set, so there is no need to go through the extra steps of setting "best_desc" and "best_group" and then break out of the loop just to set "stats" and "group" again. Signed-off-by: Lokesh Nagappa Jaliminche <redacted> --- fs/ext4/ialloc.c | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-)diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 1b8024d..16f94db 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c@@ -473,10 +473,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent, ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter); if (S_ISDIR(mode) && - ((parent == d_inode(sb->s_root)) || - (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) { + ((parent == d_inode(sb->s_root)) || + (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {I don't understand why you are changing the indentation here?quoted
+ unsigned int inodes_per_flex_group; + unsigned long int clusters_per_flex_group; int best_ndir = inodes_per_group; int ret = -1; + inodes_per_flex_group = inodes_per_group * flex_size; + clusters_per_flex_group = sbi->s_clusters_per_group * flex_size;Have you actually tested if this code works? When I was looking at this code I was accidentally looking at the ext2 version, which only deals with individual groups, and not flex groups. I don't think that a flex group can actually be completely empty, since it will always contain at least an inode table and some bitmaps. This calculation needs to take this into account or it will just be overhead and not an optimization. Something like: /* account for inode table and allocation bitmaps */ clusters_per_flex_group = sbi->s_clusters_per_group * flex_size - ((inodes_per_flex_group * EXT4_INODE_SIZE(sb) + (flex_size << sb->s_blocksize_bits) * 2 ) >> EXT4_SB(sb)->s_cluster_bits; It would be worthwhile to print this out and verify that there are actually groups with this many free blocks in a newly formatted filesystem. Cheers, Andreasquoted
if (qstr) { hinfo.hash_version = DX_HASH_HALF_MD4;@@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent, for (i = 0; i < ngroups; i++) { g = (parent_group + i) % ngroups; get_orlov_stats(sb, g, flex_size, &stats); + /* the group can't get any better than empty */ + if (inodes_per_flex_group == stats.free_inodes && + clusters_per_flex_group == stats.free_clusters) + goto found_flex_bg; if (!stats.free_inodes) continue; if (stats.used_dirs >= best_ndir) --1.7.1quoted
On Mon, Dec 21, 2015 at 10:27:37AM -0700, Andreas Dilger wrote:quoted
On Dec 18, 2015, at 4:32 PM, lokesh jaliminche [off-list ref] wrote: From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001 From: Lokesh Nagappa Jaliminche <redacted>In the patch summary there is a typo - s/serch/search/ Also, it appears your email client has mangled the whitespace in the patch. Please use "git send-email" to send your patch to the list: git send-email --to tytso@mit.edu --cc linux-ext4@vger.kernel.org HEAD~1 so that it arrives intact. You probably need to set it up in ~/.gitconfig: [sendemail] confirm = compose smtpdomain = {your client hostname} smtpserver = {your SMTP server hostname} Cheers, Andreasquoted
Added a check at the start of group search loop to avoid looping unecessarily in case of empty group. This also allow group search to jump directly to "found_flex_bg" with "stats" and "group" already set, so there is no need to go through the extra steps of setting "best_desc" and "best_group" and then break out of the loop just to set "stats" and "group" again. Signed-off-by: Lokesh N Jaliminche <redacted> --- fs/ext4/ialloc.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-)diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 1b8024d..588bf8e 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c@@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block*sb, struct inode *parent, struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_group_t real_ngroups = ext4_get_groups_count(sb); int inodes_per_group = EXT4_INODES_PER_GROUP(sb); + unsigned int inodes_per_flex_group; + long unsigned int blocks_per_clustre; unsigned int freei, avefreei, grp_free; ext4_fsblk_t freeb, avefreec; unsigned int ndirs;@@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block*sb, struct inode *parent, percpu_counter_read_positive(&sbi->s_freeclusters_counter)); avefreec = freeb; do_div(avefreec, ngroups); + inodes_per_flex_group = inodes_per_group * flex_size; + blocks_per_clustre = sbi->s_blocks_per_group * flex_size; ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter); if (S_ISDIR(mode) &&@@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block*sb, struct inode *parent, for (i = 0; i < ngroups; i++) { g = (parent_group + i) % ngroups; get_orlov_stats(sb, g, flex_size, &stats); + /* the group can't get any better than empty */ + if (inodes_per_flex_group == stats.free_inodes && + blocks_per_clustre == stats.free_clusters) + goto found_flex_bg; if (!stats.free_inodes) continue; if (stats.used_dirs >= best_ndir) -- 1.7.1 <0001-EXT4-optimizing-group-serch-for-inode-allocation.patch>Cheers, Andreas<0001-ext4-optimizing-group-serch-for-inode-allocation.patch><0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
Cheers, Andreas
Attachments
- signature.asc [application/pgp-signature] 833 bytes