Thread (48 messages) 48 messages, 7 authors, 2016-07-01

Re: [PATCH v2 1/3] mempool: add stack (lifo) mempool handler

From: Hunt, David <hidden>
Date: 2016-06-15 10:10:13

Hi Olivier,

On 23/5/2016 1:55 PM, Olivier Matz wrote:
Hi David,

Please find some comments below.

On 05/19/2016 04:48 PM, David Hunt wrote:
quoted
This is a mempool handler that is useful for pipelining apps, where
the mempool cache doesn't really work - example, where we have one
core doing rx (and alloc), and another core doing Tx (and return). In
such a case, the mempool ring simply cycles through all the mbufs,
resulting in a LLC miss on every mbuf allocated when the number of
mbufs is large. A stack recycles buffers more effectively in this
case.

v2: cleanup based on mailing list comments. Mainly removal of
unnecessary casts and comments.

Signed-off-by: David Hunt <redacted>
---
  lib/librte_mempool/Makefile            |   1 +
  lib/librte_mempool/rte_mempool_stack.c | 145 +++++++++++++++++++++++++++++++++
  2 files changed, 146 insertions(+)
  create mode 100644 lib/librte_mempool/rte_mempool_stack.c
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index f19366e..5aa9ef8 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -44,6 +44,7 @@ LIBABIVER := 2
  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_handler.c
  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_default.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_stack.c
  # install includes
  SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
  
diff --git a/lib/librte_mempool/rte_mempool_stack.c b/lib/librte_mempool/rte_mempool_stack.c
new file mode 100644
index 0000000..6e25028
--- /dev/null
+++ b/lib/librte_mempool/rte_mempool_stack.c
@@ -0,0 +1,145 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
Should be 2016?
Yup, will change.
quoted
...
+
+static void *
+common_stack_alloc(struct rte_mempool *mp)
+{
+	struct rte_mempool_common_stack *s;
+	unsigned n = mp->size;
+	int size = sizeof(*s) + (n+16)*sizeof(void *);
+
+	/* Allocate our local memory structure */
+	s = rte_zmalloc_socket("common-stack",
"mempool-stack" ?
Yes. Also, I thing the names of the function should be changed from 
common_stack_x to simply stack_x. The "common_" does not add anything.
quoted
+			size,
+			RTE_CACHE_LINE_SIZE,
+			mp->socket_id);
+	if (s == NULL) {
+		RTE_LOG(ERR, MEMPOOL, "Cannot allocate stack!\n");
+		return NULL;
+	}
+
+	rte_spinlock_init(&s->sl);
+
+	s->size = n;
+	mp->pool = s;
+	rte_mempool_set_handler(mp, "stack");
rte_mempool_set_handler() is a user function, it should be called here
Sure, removed.
quoted
+
+	return s;
+}
+
+static int common_stack_put(void *p, void * const *obj_table,
+		unsigned n)
+{
+	struct rte_mempool_common_stack *s = p;
+	void **cache_objs;
+	unsigned index;
+
+	rte_spinlock_lock(&s->sl);
+	cache_objs = &s->objs[s->len];
+
+	/* Is there sufficient space in the stack ? */
+	if ((s->len + n) > s->size) {
+		rte_spinlock_unlock(&s->sl);
+		return -ENOENT;
+	}
The usual return value for a failing put() is ENOBUFS (see in rte_ring).
Done.
After reading it, I realize that it's nearly exactly the same code than
in "app/test: test external mempool handler".
http://patchwork.dpdk.org/dev/patchwork/patch/12896/

We should drop one of them. If this stack handler is really useful for
a performance use-case, it could go in librte_mempool. At the first
read, the code looks like a demo example : it uses a simple spinlock for
concurrent accesses to the common pool. Maybe the mempool cache hides
this cost, in this case we could also consider removing the use of the
rte_ring.
Unlike the code in the test app, the stack handler does not use a ring. 
This is for the
case where applications do a lot of core-to-core transfers of mbufs. The 
test app was
simply to demonstrate a simple example of a malloc mempool handler. This 
patch adds
a new lifo handler for general use.

Using the mempool_perf_autotest, I see a 30% increase in throughput when
local cache is enabled/used.  However, there is up to a 50% degradation 
when local cache
is NOT used, so it's not usable in all situations. However, with a 30% 
gain for the cache
use-case, I think it's worth having in there as an option for people to 
try if the use-case suits.

Do you have some some performance numbers? Do you know if it scales
with the number of cores?
30% gain when local cache is used. And these numbers scale up with the
number of cores on my test machine. It may be better for other use cases.
If we can identify the conditions where this mempool handler
overperforms the default handler, it would be valuable to have them
in the documentation.
I could certainly add this to the docs, and mention the recommendation to
use local cache.

Regards,
Dave.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help