Re: [PATCH v2 1/3] mempool: add stack (lifo) mempool handler
From: Hunt, David <hidden>
Date: 2016-06-15 10:10:13
Hi Olivier, On 23/5/2016 1:55 PM, Olivier Matz wrote:
Hi David, Please find some comments below. On 05/19/2016 04:48 PM, David Hunt wrote:quoted
This is a mempool handler that is useful for pipelining apps, where the mempool cache doesn't really work - example, where we have one core doing rx (and alloc), and another core doing Tx (and return). In such a case, the mempool ring simply cycles through all the mbufs, resulting in a LLC miss on every mbuf allocated when the number of mbufs is large. A stack recycles buffers more effectively in this case. v2: cleanup based on mailing list comments. Mainly removal of unnecessary casts and comments. Signed-off-by: David Hunt <redacted> --- lib/librte_mempool/Makefile | 1 + lib/librte_mempool/rte_mempool_stack.c | 145 +++++++++++++++++++++++++++++++++ 2 files changed, 146 insertions(+) create mode 100644 lib/librte_mempool/rte_mempool_stack.cdiff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile index f19366e..5aa9ef8 100644 --- a/lib/librte_mempool/Makefile +++ b/lib/librte_mempool/Makefile@@ -44,6 +44,7 @@ LIBABIVER := 2 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool.c SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool_handler.c SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool_default.c +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool_stack.c # install includes SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.hdiff --git a/lib/librte_mempool/rte_mempool_stack.c b/lib/librte_mempool/rte_mempool_stack.c new file mode 100644 index 0000000..6e25028 --- /dev/null +++ b/lib/librte_mempool/rte_mempool_stack.c@@ -0,0 +1,145 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved.Should be 2016?
Yup, will change.
quoted
... + +static void * +common_stack_alloc(struct rte_mempool *mp) +{ + struct rte_mempool_common_stack *s; + unsigned n = mp->size; + int size = sizeof(*s) + (n+16)*sizeof(void *); + + /* Allocate our local memory structure */ + s = rte_zmalloc_socket("common-stack","mempool-stack" ?
Yes. Also, I thing the names of the function should be changed from common_stack_x to simply stack_x. The "common_" does not add anything.
quoted
+ size, + RTE_CACHE_LINE_SIZE, + mp->socket_id); + if (s == NULL) { + RTE_LOG(ERR, MEMPOOL, "Cannot allocate stack!\n"); + return NULL; + } + + rte_spinlock_init(&s->sl); + + s->size = n; + mp->pool = s; + rte_mempool_set_handler(mp, "stack");rte_mempool_set_handler() is a user function, it should be called here
Sure, removed.
quoted
+ + return s; +} + +static int common_stack_put(void *p, void * const *obj_table, + unsigned n) +{ + struct rte_mempool_common_stack *s = p; + void **cache_objs; + unsigned index; + + rte_spinlock_lock(&s->sl); + cache_objs = &s->objs[s->len]; + + /* Is there sufficient space in the stack ? */ + if ((s->len + n) > s->size) { + rte_spinlock_unlock(&s->sl); + return -ENOENT; + }The usual return value for a failing put() is ENOBUFS (see in rte_ring).
Done.
After reading it, I realize that it's nearly exactly the same code than in "app/test: test external mempool handler". http://patchwork.dpdk.org/dev/patchwork/patch/12896/ We should drop one of them. If this stack handler is really useful for a performance use-case, it could go in librte_mempool. At the first read, the code looks like a demo example : it uses a simple spinlock for concurrent accesses to the common pool. Maybe the mempool cache hides this cost, in this case we could also consider removing the use of the rte_ring.
Unlike the code in the test app, the stack handler does not use a ring. This is for the case where applications do a lot of core-to-core transfers of mbufs. The test app was simply to demonstrate a simple example of a malloc mempool handler. This patch adds a new lifo handler for general use. Using the mempool_perf_autotest, I see a 30% increase in throughput when local cache is enabled/used. However, there is up to a 50% degradation when local cache is NOT used, so it's not usable in all situations. However, with a 30% gain for the cache use-case, I think it's worth having in there as an option for people to try if the use-case suits.
Do you have some some performance numbers? Do you know if it scales with the number of cores?
30% gain when local cache is used. And these numbers scale up with the number of cores on my test machine. It may be better for other use cases.
If we can identify the conditions where this mempool handler overperforms the default handler, it would be valuable to have them in the documentation.
I could certainly add this to the docs, and mention the recommendation to use local cache. Regards, Dave.