Re: [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy
From: Hunt, David <hidden>
Date: 2016-06-24 15:56:42
Hi Jerin, I just ran a couple of tests on this patch on the latest master head on a couple of machines. An older quad socket E5-4650 and a quad socket E5-2699 v3 E5-4650: I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the cached tests. E5-2699 v3: I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the cached tests. This is purely the autotest comparison, I don't have traffic generator results. But based on the above, I don't think there are any performance issues with the patch. Regards, Dave. On 24/5/2016 4:17 PM, Jerin Jacob wrote:
On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote:quoted
Hi Jerin, On 05/24/2016 04:50 PM, Jerin Jacob wrote:quoted
Signed-off-by: Jerin Jacob <redacted> --- lib/librte_mempool/rte_mempool.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index ed2c110..ebe399a 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h@@ -74,6 +74,7 @@ #include <rte_memory.h> #include <rte_branch_prediction.h> #include <rte_ring.h> +#include <rte_memcpy.h> #ifdef __cplusplus extern "C" {@@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, unsigned n, __rte_unused int is_mp) { struct rte_mempool_cache *cache; - uint32_t index; void **cache_objs; unsigned lcore_id = rte_lcore_id(); uint32_t cache_size = mp->cache_size;@@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, */ /* Add elements back into the cache */ - for (index = 0; index < n; ++index, obj_table++) - cache_objs[index] = *obj_table; + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); cache->len += n;The commit title should be "mempool" instead of "mbuf".I will fix it.quoted
Are you seeing some performance improvement by using rte_memcpy()?Yes, In some case, In default case, It was replaced with memcpy by the compiler itself(gcc 5.3). But when I tried external mempool manager patch and then performance dropped almost 800Kpps. Debugging further it turns out that external mempool managers unrelated change was knocking out the memcpy. explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still unknown(In my test setup, packets are in the local cache, so it must be something do with __mempool_put_bulk text alignment change or similar. Anyone else observed performance drop with external poolmanager? Jerinquoted
Regards Olivier