Thread (8 messages) 8 messages, 3 authors, 2016-02-23

Re: [PATCH v4] EXT4: optimizing group search for inode allocation

From: Andreas Dilger <hidden>
Date: 2016-01-11 21:13:53

On Jan 6, 2016, at 5:46 PM, Lokesh Jaliminche [off-list ref] wrote:
you are right flex groups cannot be completely empty as some
blocks are used for inode table and allocation bitmaps. I
missed that point. I have corrected that. Also I have tested
and validated this patch by putting some traces it works!
Good news.  Thanks for updating the patch and testing it.
More comments below.
quoted hunk ↗ jump to hunk
Here are corresponding logs:
commands:
=========
mkfs.ext4 -g 10000 -G 2 /dev/sdc
mount -t ext4  /dev/sdc /mnt/test_ext4/
cd /mnt/test_ext4
mkdir 1
cd 1

logs:
=====
Jan  7 03:36:45 root kernel: [   64.792939] max_usable_blocks_per_flex_group     =  [19692]
Jan  7 03:36:45 root kernel: [   64.792939] stats.free_clusters                  =  [19264]
Jan  7 03:36:45 root kernel: [   64.792939] Inodes_per_flex_group                =  [4864]
Jan  7 03:36:45 root kernel: [   64.792939] stats.free_inodes                    =  [4853]
Jan  7 03:36:45 root kernel: [   64.792950] max_usable_blocks_per_flex_group     =  [19692]
Jan  7 03:36:45 root kernel: [   64.792950] stats.free_clusters                  =  [19481]
Jan  7 03:36:45 root kernel: [   64.792950] Inodes_per_flex_group                =  [4864]
Jan  7 03:36:45 root kernel: [   64.792950] stats.free_inodes                    =  [4864]
Jan  7 03:36:45 root kernel: [   64.792958] max_usable_blocks_per_flex_group     =  [19692]
Jan  7 03:36:45 root kernel: [   64.792958] stats.free_clusters                  =  [19481]
Jan  7 03:36:45 root kernel: [   64.792958] Inodes_per_flex_group                =  [4864]
Jan  7 03:36:45 root kernel: [   64.792958] stats.free_inodes                    =  [4864]
Jan  7 03:36:45 root kernel: [   64.792965] max_usable_blocks_per_flex_group     =  [19692]
Jan  7 03:36:45 root kernel: [   64.792965] stats.free_clusters                  =  [19481]
Jan  7 03:36:45 root kernel: [   64.792965] Inodes_per_flex_group                =  [4864]
Jan  7 03:36:45 root kernel: [   64.792965] stats.free_inodes                    =  [4864]
Jan  7 03:36:45 root kernel: [   64.792972] max_usable_blocks_per_flex_group     =  [19692]
Jan  7 03:36:45 root kernel: [   64.792972] stats.free_clusters                  =  [19481]
Jan  7 03:36:45 root kernel: [   64.792972] Inodes_per_flex_group                =  [4864]
Jan  7 03:36:45 root kernel: [   64.792972] stats.free_inodes                    =  [4864]
Jan  7 03:36:45 root kernel: [   64.792979] max_usable_blocks_per_flex_group     =  [19692]<<<<<<<<<<<
Jan  7 03:36:45 root kernel: [   64.792979] stats.free_clusters                  =  [19692]<<<<<<<<<<<
Jan  7 03:36:45 root kernel: [   64.792979] Inodes_per_flex_group                =  [4864]<<<<<<<<<<<<
Jan  7 03:36:45 root kernel: [   64.792979] stats.free_inodes                    =  [4864]<<<<<<<<<<<<
Jan  7 03:36:45 root kernel: [   64.792985] found_flex_bg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>condition satisfied


[Patch v4]:
==========
From: Lokesh Nagappa Jaliminche <redacted>
Date: Thu, 7 Jan 2016 05:12:20 +0530
Subject: [PATCH] ext4: optimizing group serch for inode allocation

Added a check at the start of group search loop to
avoid looping unecessarily in case of empty group.
This also allow group search to jump directly to
"found_flex_bg" with "stats" and "group" already set,
so there is no need to go through the extra steps of
setting "best_desc" , "best_group" and then break
out of the loop just to set "stats" and "group" again.

Signed-off-by: Lokesh Nagappa Jaliminche <redacted>
---
fs/ext4/ext4.h   |    2 ++
fs/ext4/ialloc.c |   21 +++++++++++++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index cc7ca4e..fc48f9a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -326,6 +326,8 @@ struct flex_groups {
#define        EXT4_MAX_DESC_SIZE              EXT4_MIN_BLOCK_SIZE
#define EXT4_DESC_SIZE(s)              (EXT4_SB(s)->s_desc_size)
#ifdef __KERNEL__
+# define EXT4_INODE_TABLE_BLOCKS_PER_GROUP(s)  (EXT4_SB(s)->s_itb_per_group)
+# define EXT4_BLOCKS_PER_CLUSTER(s)    (EXT4_SB(s)->s_cluster_ratio)
I don't think there is much value to these macros.  They aren't much
shorter than the actual variable access, and they don't provide any
improved clarity for the reader.

I think they were introduced many years ago so that this header could
be shared between the kernel and e2fsprogs, but that is no longer true
so I don't think there is a need to add new ones.

Ted, any opinion on this?
quoted hunk ↗ jump to hunk
# define EXT4_BLOCKS_PER_GROUP(s)      (EXT4_SB(s)->s_blocks_per_group)
# define EXT4_CLUSTERS_PER_GROUP(s)    (EXT4_SB(s)->s_clusters_per_group)
# define EXT4_DESC_PER_BLOCK(s)                (EXT4_SB(s)->s_desc_per_block)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1b8024d..dc66ad8 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -463,7 +463,6 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
                       sbi->s_log_groups_per_flex;
               parent_group >>= sbi->s_log_groups_per_flex;
       }
-
       freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
       avefreei = freei / ngroups;
Not sure why this line is being deleted?
quoted hunk ↗ jump to hunk
@@ -477,7 +476,20 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
            (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
               int best_ndir = inodes_per_group;
               int ret = -1;
-
+               /* blocks used for inode table and allocation bitmaps */
+               unsigned int max_usable_blocks_per_flex_group;
This could be shorter and still clear, like "max_blocks_per_flex".
+               unsigned int blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
+               unsigned int inode_table_blocks_per_group = \
+                               EXT4_INODE_TABLE_BLOCKS_PER_GROUP(sb);
+               /* Number of blocks per cluster */
+               unsigned int cluster_ratio = EXT4_BLOCKS_PER_CLUSTER(sb);
There also isn't much value to making local variables that hold these
same values again.  It increases stack space for little benefit.  They
are only accessed once anyway.
+               unsigned int inodes_per_flex_group = inodes_per_group * \
+                                                       flex_size;
+               max_usable_blocks_per_flex_group = \
+                                       ((blocks_per_group*flex_size) - \
Spaces around that '*'.
+                                       (inode_table_blocks_per_group * \
+                                       flex_size + 2 * flex_size)) / \
This is C and not a CPP macro, so no need for linefeed escapes.
+                                       cluster_ratio;
This should be ">> EXT4_SB(sb)->s_cluster_bits" to avoid the divide.
quoted hunk ↗ jump to hunk
               if (qstr) {
                       hinfo.hash_version = DX_HASH_HALF_MD4;
                       hinfo.seed = sbi->s_hash_seed;
@@ -489,6 +501,11 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
               for (i = 0; i < ngroups; i++) {
                       g = (parent_group + i) % ngroups;
                       get_orlov_stats(sb, g, flex_size, &stats);
+                       /* can't get better group than empty group */
+                       if (max_usable_blocks_per_flex_group == \
+                               stats.free_clusters && \
No need for linefeed escapes.
+                               inodes_per_flex_group == stats.free_inodes)
Align continued line after '(' from the "if (" line.
+                                       goto found_flex_bg;
This is indented too much.  It will work properly when you fix the
previous line.
                       if (!stats.free_inodes)
                               continue;
                       if (stats.used_dirs >= best_ndir)
--
1.7.1

Regards,
Lokesh

On Sun, Jan 03, 2016 at 11:34:51AM -0700, Andreas Dilger wrote:
quoted
On Dec 26, 2015, at 01:41, Lokesh Jaliminche [off-list ref] wrote:
quoted

From 5199982d29ff291b181c25af63a22d6f2d6d3f7b Mon Sep 17 00:00:00 2001
From: Lokesh Nagappa Jaliminche <redacted>
Date: Sat, 26 Dec 2015 13:19:36 +0530
Subject: [PATCH v3] ext4: optimizing group search for inode allocation

Added a check at the start of group search loop to
avoid looping unecessarily in case of empty group.
This also allow group search to jump directly to
"found_flex_bg" with "stats" and "group" already set,
so there is no need to go through the extra steps of
setting "best_desc" and "best_group" and then break
out of the loop just to set "stats" and "group" again.

Signed-off-by: Lokesh Nagappa Jaliminche <redacted>
---
fs/ext4/ialloc.c |   12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1b8024d..16f94db 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -473,10 +473,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
      ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);

      if (S_ISDIR(mode) &&
-           ((parent == d_inode(sb->s_root)) ||
-            (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
+               ((parent == d_inode(sb->s_root)) ||
+               (ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
I don't understand why you are changing the indentation here?
quoted
+               unsigned int inodes_per_flex_group;
+               unsigned long int clusters_per_flex_group;
              int best_ndir = inodes_per_group;
              int ret = -1;
+               inodes_per_flex_group = inodes_per_group * flex_size;
+               clusters_per_flex_group = sbi->s_clusters_per_group * flex_size;
Have you actually tested if this code works?  When I was looking at this
code I was accidentally looking at the ext2 version, which only deals with
individual groups, and not flex groups.  I don't think that a flex group
can actually be completely empty, since it will always contain at least
an inode table and some bitmaps.

This calculation needs to take this into account or it will just be
overhead and not an optimization. Something like:

               /* account for inode table and allocation bitmaps */
               clusters_per_flex_group = sbi->s_clusters_per_group * flex_size -
                       ((inodes_per_flex_group * EXT4_INODE_SIZE(sb) +
                         (flex_size << sb->s_blocksize_bits) * 2 ) >>
                       EXT4_SB(sb)->s_cluster_bits;

It would be worthwhile to print this out and verify that there are actually
groups with this many free blocks in a newly formatted filesystem.

Cheers, Andreas
quoted
              if (qstr) {
                      hinfo.hash_version = DX_HASH_HALF_MD4;
@@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
              for (i = 0; i < ngroups; i++) {
                      g = (parent_group + i) % ngroups;
                      get_orlov_stats(sb, g, flex_size, &stats);
+                       /* the group can't get any better than empty */
+                       if (inodes_per_flex_group == stats.free_inodes &&
+                               clusters_per_flex_group == stats.free_clusters)
+                               goto found_flex_bg;
                      if (!stats.free_inodes)
                              continue;
                      if (stats.used_dirs >= best_ndir)
--
1.7.1

quoted
On Mon, Dec 21, 2015 at 10:27:37AM -0700, Andreas Dilger wrote:
quoted
On Dec 18, 2015, at 4:32 PM, lokesh jaliminche [off-list ref] wrote:

From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
From: Lokesh Nagappa Jaliminche <redacted>
In the patch summary there is a typo - s/serch/search/

Also, it appears your email client has mangled the whitespace in the
patch.  Please use "git send-email" to send your patch to the list:

git send-email --to tytso@mit.edu --cc linux-ext4@vger.kernel.org HEAD~1

so that it arrives intact.  You probably need to set it up in ~/.gitconfig:

[sendemail]
      confirm = compose
      smtpdomain = {your client hostname}
      smtpserver = {your SMTP server hostname}

Cheers, Andreas
quoted
Added a check at the start of group search loop to
avoid looping unecessarily in case of empty group.
This also allow group search to jump directly to
"found_flex_bg" with "stats" and "group" already set,
so there is no need to go through the extra steps of
setting "best_desc" and "best_group" and then break
out of the loop just to set "stats" and "group" again.

Signed-off-by: Lokesh N Jaliminche <redacted>
---
fs/ext4/ialloc.c |    8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1b8024d..588bf8e 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block
*sb, struct inode *parent,
struct ext4_sb_info *sbi = EXT4_SB(sb);
ext4_group_t real_ngroups = ext4_get_groups_count(sb);
int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
+ unsigned int inodes_per_flex_group;
+ long unsigned int blocks_per_clustre;
unsigned int freei, avefreei, grp_free;
ext4_fsblk_t freeb, avefreec;
unsigned int ndirs;
@@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block
*sb, struct inode *parent,
percpu_counter_read_positive(&sbi->s_freeclusters_counter));
avefreec = freeb;
do_div(avefreec, ngroups);
+ inodes_per_flex_group = inodes_per_group * flex_size;
+ blocks_per_clustre = sbi->s_blocks_per_group * flex_size;
ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);

if (S_ISDIR(mode) &&
@@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block
*sb, struct inode *parent,
for (i = 0; i < ngroups; i++) {
g = (parent_group + i) % ngroups;
get_orlov_stats(sb, g, flex_size, &stats);
+ /* the group can't get any better than empty */
+ if (inodes_per_flex_group == stats.free_inodes &&
+ blocks_per_clustre == stats.free_clusters)
+ goto found_flex_bg;
if (!stats.free_inodes)
continue;
if (stats.used_dirs >= best_ndir)
--
1.7.1
<0001-EXT4-optimizing-group-serch-for-inode-allocation.patch>

Cheers, Andreas

<0001-ext4-optimizing-group-serch-for-inode-allocation.patch>
<0001-ext4-optimizing-group-serch-for-inode-allocation.patch>

Cheers, Andreas




Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help