Thread (58 messages) 58 messages, 9 authors, 2013-09-06

Re: [PATCH 01/23] radix-tree: implement preload for multiple contiguous elements

From: Jan Kara <jack@suse.cz>
Date: 2013-08-05 11:17:39
Also in: linux-fsdevel, lkml

On Sun 04-08-13 05:17:03, Kirill A. Shutemov wrote:
From: "Kirill A. Shutemov" <redacted>

The radix tree is variable-height, so an insert operation not only has
to build the branch to its corresponding item, it also has to build the
branch to existing items if the size has to be increased (by
radix_tree_extend).

The worst case is a zero height tree with just a single item at index 0,
and then inserting an item at index ULONG_MAX. This requires 2 new branches
of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.

Radix tree is usually protected by spin lock. It means we want to
pre-allocate required memory before taking the lock.

Currently radix_tree_preload() only guarantees enough nodes to insert
one element. It's a hard limit. For transparent huge page cache we want
to insert HPAGE_PMD_NR (512 on x86-64) entries to address_space at once.

This patch introduces radix_tree_preload_count(). It allows to
preallocate nodes enough to insert a number of *contiguous* elements.
The feature costs about 5KiB per-CPU, details below.

Worst case for adding N contiguous items is adding entries at indexes
(ULONG_MAX - N) to ULONG_MAX. It requires nodes to insert single worst-case
item plus extra nodes if you cross the boundary from one node to the next.

Preload uses per-CPU array to store nodes. The total cost of preload is
"array size" * sizeof(void*) * NR_CPUS. We want to increase array size
to be able to handle 512 entries at once.

Size of array depends on system bitness and on RADIX_TREE_MAP_SHIFT.

We have three possible RADIX_TREE_MAP_SHIFT:

 #ifdef __KERNEL__
 #define RADIX_TREE_MAP_SHIFT	(CONFIG_BASE_SMALL ? 4 : 6)
 #else
 #define RADIX_TREE_MAP_SHIFT	3	/* For more stressful testing */
 #endif

On 64-bit system:
For RADIX_TREE_MAP_SHIFT=3, old array size is 43, new is 107.
For RADIX_TREE_MAP_SHIFT=4, old array size is 31, new is 63.
For RADIX_TREE_MAP_SHIFT=6, old array size is 21, new is 30.

On 32-bit system:
For RADIX_TREE_MAP_SHIFT=3, old array size is 21, new is 84.
For RADIX_TREE_MAP_SHIFT=4, old array size is 15, new is 46.
For RADIX_TREE_MAP_SHIFT=6, old array size is 11, new is 19.

On most machines we will have RADIX_TREE_MAP_SHIFT=6. In this case,
on 64-bit system the per-CPU feature overhead is
 for preload array:
   (30 - 21) * sizeof(void*) = 72 bytes
 plus, if the preload array is full
   (30 - 21) * sizeof(struct radix_tree_node) = 9 * 560 = 5040 bytes
 total: 5112 bytes

on 32-bit system the per-CPU feature overhead is
 for preload array:
   (19 - 11) * sizeof(void*) = 32 bytes
 plus, if the preload array is full
   (19 - 11) * sizeof(struct radix_tree_node) = 8 * 296 = 2368 bytes
 total: 2400 bytes

Since only THP uses batched preload at the moment, we disable (set max
preload to 1) it if !CONFIG_TRANSPARENT_HUGEPAGE_PAGECACHE. This can be
changed in the future.

Signed-off-by: Matthew Wilcox <redacted>
Signed-off-by: Kirill A. Shutemov <redacted>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 include/linux/radix-tree.h | 11 +++++++++++
 lib/radix-tree.c           | 41 ++++++++++++++++++++++++++++++++---------
 2 files changed, 43 insertions(+), 9 deletions(-)
...
quoted hunk ↗ jump to hunk
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 7811ed3..99ab73c 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -82,16 +82,24 @@ static struct kmem_cache *radix_tree_node_cachep;
  * The worst case is a zero height tree with just a single item at index 0,
  * and then inserting an item at index ULONG_MAX. This requires 2 new branches
  * of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.
+ *
+ * Worst case for adding N contiguous items is adding entries at indexes
+ * (ULONG_MAX - N) to ULONG_MAX. It requires nodes to insert single worst-case
+ * item plus extra nodes if you cross the boundary from one node to the next.
+ *
  * Hence:
  */
-#define RADIX_TREE_PRELOAD_SIZE (RADIX_TREE_MAX_PATH * 2 - 1)
+#define RADIX_TREE_PRELOAD_MIN (RADIX_TREE_MAX_PATH * 2 - 1)
+#define RADIX_TREE_PRELOAD_MAX \
+	(RADIX_TREE_PRELOAD_MIN + \
+	 DIV_ROUND_UP(RADIX_TREE_PRELOAD_NR - 1, RADIX_TREE_MAP_SIZE))
  Umm, is this really correct? I see two problems:
1) You may need internal tree nodes at various levels but you seem to
account only for the level 1.
2) The rounding doesn't seem right because RADIX_TREE_MAP_SIZE+2 nodes may
require 3 nodes at level 1 if the indexes are like:
i_0 | i_1 .. i_{RADIX_TREE_MAP_SIZE} | i_{RADIX_TREE_MAP_SIZE+1}
    ^                                ^
    node boundary                    node boundary

  Otherwise the patch looks good.

								Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help